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I. REAL PARTIES IN INTEREST 
The real party in interest is Monsanto Company, the parent of wholly-owned subsidiary 
DEKALB Genetics Corporation, the assignee of this application. 

11. RELATED APPEALS AND INTERFERENCES 
There are no interferences or appeals for related cases. 

III. STATUS OF THE CLADMS 

Claims 1-100 were filed with the appHcation on June 23, 2000. Claims 1-71, 74-77, SO- 
BS, 85, 97 and 92-93 were canceled in a Preliminary Amendment filed concurrently with the 
application. Claims 100-1 14 were added in a Supplemental Preliminary Amendment, mailed in 
the case on November 1, 2001. 

Claims 72-73, 78-79, 84, 86, 88-91, 94-110 were pending at the time of the final Office 
Action mailed in the case on September 27, 2002. After the filing of a Notice of Appeal and 
Appeal Brief by Appellants, the Examiner issued a Third Office Action mailed on September 27, 
2002. In a Response to the Third Office Action, Appellants canceled claim 94. 

A final Office Action was mailed by the Examiner on December 22, 2003. No 
amendments have been filed after the final Office Action. Claims 72-73, 78-79, 84, 86, 88-91, 
95-110 are therefore now pending and are the subject of the instant appeal. A copy of the 
appealed claims is attached as APPENDIX 1. 

IV. STATUS OF AMENDMENTS 
No amendments were made subsequent to the final Office Action. 
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V. SUMMARY OF THE INVENTION 
The invention relates to transgenic maize plants having increased starch content and 
extractability and methods of use thereof. Specification at page 6, line 13 to page 7, line 2. 
Increased starch content and extractability is obtained by the expression of RNA molecules that 
are substantially identical or complementary to 19kD or 22kD a-zein plant seed storage proteins, 
thereby decreasing the amount of corresponding seed storage protein and increase in starch 
content and extractability in the cells of the plant. Specification at page 6, line 13 to page 7, line 
23. 

VI. ISSUES ON APPEAL 

(1) Are claims 72-73, 78-79, 84, 86, 88-91 and 94-110 properly rejected under 35 
U.S.C. §112, second paragraph, as being indefinite? 

(2) Are claims 72-73, 78-79, 84, 86, 88-91 and 94-110 properly rejected under 35 
U.S.C. §112, first paragraph, as not being supported by an adequate written description in the 
specification? 

(3) Are claims 72-73, 78-79, 84, 86, 88-91 and 94-110 properly rejected under 35 
U.S.C. §112, first paragraph, as not being enabled by the specification? 

VIL GROUPING OF THE CLAIMS 
The claims stand or fall together. 
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VIII. SUMMARY OF THE ARGUMENT 

A. The terms "augmented," "preselected," "substantially identical" and 
"substantially complementary" are not indefinite under 35 U.S.C. § 112, second paragraph, as 
they are used in the claims, as one of skill in the art would readily understand the meaning of the 
terms when viewed in light of the language of the entire claim, the teaching of the specification 
and the knowledge of those of skill in the art. 

B. The claims do not lack written description because they are directed to a discrete 
class of subject matter that is fiiUy described in the specification. The specification describes the 
structural characteristics of 19 kD or 22 kD a-zein plant seed storage proteins by exemplary 
coding sequences and conserved functional domains. In addition, more than 70 genes encoding 
zein protein were isolated prior to the filing of the application. The structural detail provided is 
well more than that required under §112. The claims therefore do not lack written description. 

C. The Examiner has failed to establish a prima facie case of lack of enablement 
under 35 U.S.C. § 112, first paragraph, and improperly ignores evidence submitted by Appellant 
affirmatively demonstrating the enablement of the claims. The alleged evidence presented in 
support of the rejection in fact supports the enablement of the claims and contradicts the 
rejections made. 

VIII. ARGUMENT 
A. The Claims Are Not Indefinite Under 35 U.S>C. §112, Second Paragraph 

The Examiner has finally rejected claims 72-73, 78-79, 84, 86, 88-91 and 94-110 under 
35 U.S.C. §112, second parag?-aph, as being indefinite for failing to particularly point out the 
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subject matter which Applicant regards as the invention. The rejections and Appellants' 
response thereto are set forth below. 
(1) Rejection of Claim 72 

(a) "Augmented" 

The Examiner asserts that the claim term "augmented" is unclear in that it has not been 
defined and it is allegedly not clear whether this indicates incorporation of a nucleic acid into the 
genome. However, the meaning of the term is clear fi-om the language of the claim. Claim terms 
must be viewed in the context of the claim in which they are found. Li claim 72, a fertile 
transgenic Zea mays plant is recited "the genome of which is stably augmented by a preselected 
DNA sequence." (emphasis added). The claim further specifies that "said preselected DNA 
sequence is transmitted through a complete normal sexual cycle of the transgenic plant to the 
next generation'' (emphasis added), hi view of the claim language, the only reasonable 
interpretation is that augmented refers to genetic transformation in the genome. 

Were the foregoing not the case the term "genome" in claim 72 would be completely 
superfluous. Thus interpretations excluding involvement of the genome may not be made as was 
done by the Examiner. The claim further specifies that the preselected DNA is transmitted 
through a normal sexual cycle and that the genome is "stably augmented." This also indicates 
stable transformation in the genome. 

The Examiner has therefore failed to give the claims a fair and reasonable reading based 
on the language of the claim as whole. Properly viewed, the term is fully definite. Removal of 
the rejection is therefore respectfully requested. 

(b) "Preselected" 
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The Examiner rejects the use of the term "preselected" as indefinite because it has not 
been defined. In response, Appellants note that the term has a well known meaning in the art. 
Appellants direct the Examiner to the online version of the Merriam Webster® dictionary 
( http://www.m-w.com/cgi-bin/dictionarv) . which gives the definition of the present tense of 
preselected, "preselect", as being "to choose in advance usually on the basis of a particular 
criterion." (See Exhibit A) The dictionary meaning demonstrates the well known meaning of 
the term. There is nothing indefinite in the use of a term having a well known meaning in the art. 

In the final Office Action maintaining the rejection the Examiner fiirther asserted that the 
term was indefinite because, while acknowledging that "the Office understand the meaning of 
the word 'preselected"', the criteria by which the sequence is selected have allegedly not been 
defined. In response, it is noted that this is irrelevant. The only criteria that is necessary is that 
the DNA be selected for use in the claim. One of skill in the art may use any criteria in selecting 
the DNA for use in the claim. However, this does not render the claim indefinite because 
breadth does not equate to indefiniteness, as apparently believed here by the Examiner. The 
Action itself acknowledges that "preselected" has a known meaning. This alone establishes that 
one of skill in the art would understand the meaning of the claim and thus that the claim is 
definite. What criteria are used to preselect are irrelevant because they are not a part of the claim 
and there is no basis in law to attempt to make Appellants add such a criteria given that the fiiU 
metes and bounds of the claim are clear. Given that the scope of the claim is clear to those of 
skill in the art, the claim is fiiUy definite in compUance with the second paragraph of § 112. 

Removal of the rejection is thus respectfiiUy requested. 

(2) Rejections of claims 72, 88 and 90 



25413056.1 



6 



Claims 72, 88 and 90 were finally rejected as allegedly being indefinite for the recitation 
of "substantially identical" and "substantially complementary to all or a portion." In particular, it 
is stated that the recited phrases are vague and unclear and do not specify what portion or percent 
of the sequence applicants are referring to. 

Li response, it is noted that the terms "substantially identical" and "substantially 

complementary" are defined in the specification at page 12, lines 1 1-24. The use of the terms in 

the claims is thus not indefinite. Additionally, the terms are further defined by the language of 

claim 72. For example, claim 72, from which claims 88 and 90 depend, reads as follows: 

A fertile transgenic Zea mays plant having an increased starch content, the genome of 
which is stably augmented by a preselected DNA sequence encoding an RNA molecule 
which is substantially identical, or complementary, to an mRNA encoding a 19kD or a 
22kD a-zein plant seed storage protein, wherein the preselected DNA sequence is 
expressed in the cells of the transgenic plant in an amount sufficient to decrease the 
amount of said seed storage protein and increase starch content in the cells of a plant 
which only differ from the cells of said transgenic plant in that said preselected DNA 
sequence is absent, and wherein said preselected DNA sequence is transmitted through a 
complete normal sexual cycle of the transgenic plant to the next generation, 
(emphasis added) 

As can be seen, the clam indicates that expression of the preselected DNA sequence 
results in a decrease of the amount of seed storage protein and increased starch content in the 
cells of a plant. Claims 88 and 90 incorporate this limitation by dependency. Thus the meaning 
of the terms to one of skill in the art is clear, e.g., that substantially identical, or complementary 
must refer to a preselected DNA sequence encoding an RNA molecule that is sufficiently 
identical or complementary to an mRNA encoding a 19kD or a 22kD a-zein plant seed storage 
protein to reduce expression of a 19kD or 22kD a-zein seed storage protein. This is so because 
the claim specifies that the preselected DNA sequence is expressed in the cells of the transgenic 
plant in an amoimt sufficient to decrease the amount of seed storage protein and increase starch 
content in the cells of a plant comprising the cell. Such a sequence could not represent a single 
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homologous base pair as alleged by the Examiner, as it is well known to those of skill in the art 
that in order to form the type of stable complex required for antisense suppression, longer 
stretches of complementary sequences are required. 

One of skill in the art would thus fully imderstand the metes and bounds of the claim 
because the claim is defined by discrete and readily ascertainable criteria. The test for 
defmiteness under 35 U.S.C. § 1 12, second paragraph, is whether "those skilled in the art would 
understand what is claimed when the claim is read in light of the specification." Orthokinetics, 
Inc. V. Safety Travel Chairs, Inc., 806 F.2d 1565, 1576, 1 USPQ2d 1081, 1088 (Fed. Cir. 1986). 
If one skilled in the art is able to ascertain the meaning of the claim, 35 U.S.C. § 112, second 
paragraph, is satisfied. Id, In view of the definition in the specification and claim language, the 
referenced term fiilly meet this requirement. 

In conclusion, Appellants note that the rejected claims are fully definite in compliance 
with 35 U.S.C. § 1 12, second paragraph. Reversal of the rejection is thus respectfully requested. 

B. The Claims Do Not Lack Written Description Under 35 U.S.C. §112, First 
Paragraph 

The Examiner added a rejection of claims 72-73, 78-79, 84, 86, 88-91, and 94-110 under 
the first paragraph of 35 U.S.C. §112 in the Third Office Acfion based on an alleged lack of 
written description for the claimed invention. In particular, it was alleged that the specification 
does not describe the structural characteristics of 19 kD or 22 kD a-zein plant seed storage 
proteins. 

The rejection is utterly without merit and its reissuance and maintenance by the Examiner 
is puzzling to Appellants. Essentially the same rejection was issued in the first Office Action in 
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this case, but was withdrawn in the second Office Action in view of the persuasive comments of 
Appellants in the Response to the First Office Action. Overwhelming evidence supporting the 
structural characteristics of 19 kD or 22 kD a-zein genes was shown at that time. The Third 
Office Action nonetheless reissued the rejection and the Examiner has now made the rejection 
final. 

The Examiner has alleged that written description is lacking in particular because 
information was not provided regarding what sequences are substantially identical or 
complementary to an mRNA encoding 19 kD or 22 kD a-zein genes. However, as set forth 
above, the claim terms "substantially identical, or complementary" define a discrete class of 
subject matter that is well more than adequately described. As described, such sequences must 
hybridize in vivo to 19 kD or 22 kD a-zein genes so as to yield a decrease of expression of the 19 
kD or 22 kD a-zein genes. This subject matter is fully defined in the specification as set forth 
below. 

It is first noted that Appellants do not claim 19 kD and 22 kD a-zein plant seed storage 
protein genes per se, as this class of genes was known at the time the application was filed. 
Rather transgenic plants expressing these proteins or methods comprising the use thereof are 
claimed. The structural features unique to a maize 19 kD and 22 kD a-zein plant seed storage 
protein have been fully described in the specification. Accordingly, it can not be said that 
Appellants lack written description for the terms. For example, the Board's attention is directed 
to FIG, 1 of the application. There shown are fiinctional domains that are conserved and shared 
among the zeins. Further attention is drawn to pages 1-3 of the application. There, the 
specification describes the family of known zeins, including 19 kD and 22 kD a-zeins. At page 
2, it is indicated, with a citation to Rubenstein (1982), that over 70 genes encoding zein protein 
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have been isolated. Further on page 2, at lines 19-24, functional domains of 19 kD a-zeins are 
described. The structural characteristics of the 19 kD and 22 kD a-zein plant seed storage 
proteins have thus been described in full compHance with 35 U.S.C. §112, first paragraph and Eli 
Lilly is therefore inapplicable to the instant situation. 

The specification still further describes 19 kD and 22 kD a-zein seed storage proteins in 
the form of the nucleic acid sequence of the A20 and Z4 cDNAs, respectively. While these 
sequences are species of 19 kD and 22 kD a-zein protein genes, the species are representative of 
the genus as evidenced by Marks et al (1985) (Exhibit B), which was cited previously by the 
Examiner in an enablement rejection. Marks et al. demonstrates the common structural 
characteristics shared among 19 kD and 22 kD a-zeins. For example, in the first sentence of 
the Abstract of Marks et al, it is indicated that a comparison of the protein and DNA sequences 
of zein cDNA clones "reveals that they share extensive sequence homology and probably 
originated from a common ancestral gene." hi the first paragraph of the Discussion section it is 
indicated that cDNA sequences among the 19 kD and 22 kD group of a-zein sequences are 75 to 
95% and 92% homologous, respectively. Further, Marks et al provides sequence information 
and comparisons among 19 kD and 22 kD a-zeins. The disclosure of Marks et al. thereby 
demonstrates the shared structural characteristics of the 19 kD and 22 kD a-zein seed storage 
proteins. Combined with Appellants' disclosure of the structural characteristics of 19 kD and 22 
kD a-zein proteins, this is more than adequate to demonstrate compliance with the written 
description requirement. 

hi view of the foregoing, Appellants assert that the written description requirement has 
been fully satisfied. Reversal of the rejection under 35 U.S.C. §112, first paragraph is thus 
respectfully requested. 
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C. The Claims Are Enabled Under 35 U.S.C, $112. First Paragraph 

Claims 72-73, 78-79, 84, 86, 88-91 and 95-110 were finally rejected under the first 
paragraph of 35 U.S.C. §1 12 as allegedly not being enabled by the specification. In particular, 
the Examiner alleged that the specification does not teach one of skill in the art how to increase 
the starch content or starch extractability or kernel hardness of seeds. It is stated that the 
specification only teaches how to make maize seeds with decreased amoimts of the amino acid 
leucine and increased lysine by transforming Zea mays plants with SEQ ID NOs:l and 2 
operably linked to a ZIO promoter. 

Appellants submit that no basis has been provided to doubt the enablement of the claims 
and fiirther that enablement has been affirmatively demonstrated. The Examiner attempts to 
support the rejection by citing Coleman et al (1997) (Exhibit C). Although the Examiner 
dropped the reference of Marks et al (1985) (Exhibit B), this was cited in the enablement 
rejection in the first Office Action. As set forth below, both of these references affirmatively 
demonstrate the enablement of the invention. 

Coleman et al was cited as showing that high-lysine mutants exhibiting a reduction of a- 
zein content were "concomitant with an inferior endosperm quality." It was thus suggested that, 
because of the supposedly inferior endosperm quality, "reducing the a-zein protein content of 
maize seeds using the strategy of Appellants, will not increase the starch content or starch 
extractability of maize seeds." However, the reference does the very opposite and demonstrates 
the enablement of the claims. This "inferior" endosperm is in fact a soft and starchy endosperm. 
(See Coleman, p. 7094, paragraph 2). The reference therefore shows the direct correlation 
between increased lysine, decreased a-zein and soft and starchy endosperm. The first Office 
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Action acknowledged this on page 5.^ Demonstration of this correlation supports enablement, 
given that the Examiner akeady acknowledged that Appellants have demonstrated enablement 
for increasing lysine during the prosecution. The fact that Coleman et al also demonstrated 
successful expression of a 24 kDa a-zein gene to induce the mutant phenotype provides still 
further evidence of enablement of the instant claims. 

It is additionally noted that whether the endosperm is subjectively "inferior" or not is 
irrelevant to enablement. What is considered inferior for a plant used for one purpose, for 
example, for human consumption, is not necessarily the same as for a plant used for production 
of com starch. Further, an "inferior" endosperm does not equate to an inability to increase starch 
content or extractability, as described in Coleman et aL 

The other reference cited in the first Office Action, but apparently dropped for the Third 

and Final Office Actions, also demonstrates enablement. Marks et al (1985) (Exhibit B), was 

initially cited as indicating that there are many different forms of 19 kD and 22 kD a-zeins with 

divergent and unpredictable functions. However, this reference supports enablement of the 

claims. Marks et al states in the abstract that: 

A comparison of the DNA and protein sequences of a group of zein cDNA clones 
reveals that they share extensive sequence homology and probably originated 
from a common ancestral gene. A comparison of clones corresponding to Mr 
22,000 polypeptides shows that they are 92% homologous, while five clones 
corresponding to Mr 19,000 zein vary in homology fi'om 75 to 95%. 

(emphasis added) 

Marks et al, therefore, demonstrates the conmion structural characteristics shared among 19 kD 
and 22 kD a-zeins. In the first paragraph of the Discussion section of Marks et al, it is indicated 
that cDNA sequences among the 19 kD and 22 kD group of a-zein sequences are 75 to 95% and 

^ See middle paragraph: "Two 'high -lysine' mutants were identified, opaque2 (o2) and floury2 (fl2) 
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92% homologous, respectively. Further, Marks et al provides sequence information and 
comparisons among 19 kD and 22 kD a-zeins. The disclosure of Marks et al thereby 
demonstrates the shared structural characteristics of the 19 kD and 22 kD a-zein seed storage 
proteins. 

No basis hfc been provided by the Examiner to indicate that different isoforms have 
"divergent functions." The high degree of homology and indicated common ancestor among 
zeins strongly contradicts this. Additionally, there is no support in the Marks et al reference for 
the prior contention that any of the mRNA isoforms described encode proteins other than zeins, 
the function of which is that of a seed storage protein. The reference, therefore, indicates the 
shared original and conserved structure of the 19 kD and 22 kD a-zein plant seed storage 
proteins. 

The rejection adds citation to the reference of Moonan (2002) (Exhibit D). The 
reference is alleged to show that sugarcane viral protection using less than 100% sequence 
identity resuUed in "inferior viral protection." Appellants note however that this is irrelevant to 
enablement of the current claims. First, this was sugarcane, not maize as in the instant claims, 
thus there is no showing how this is relevant to the instant claims. Second, what is subjectively 
"inferior" has absolutely nothing to do with enablement. Enablement concerns teaching one of 
skill in the art how to make and use the claimed invention, not make and use something that is 
subjectively superior. Indeed, that statement cited in the Action demonstrates that viral 
protection was achieved, but was apparently somehow subjectively inferior. Again, as long as 
one of skill in the art can make and use the invention, it is irrelevant if an invention is "inferior" 
to other techniques that may be available. Finally, the cited reference has to do with viral capsid 
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expression to obtain viral resistance. Again, there is absolutely no basis provided to analogize 
this to the cxirrent case. The current claims involve alteration of function of an endogenous 
maize gene, whereas viral suppression has to do with infection from an external pathogen. The 
scientific principles are entirely different than for the claimed invention. Appellants do not claim 
viral resistance. No basis has therefore been provided to draw any analogies between these 
technologies. 

Further evidence of enablement of the claims is provided at page 83 of the specification. 
There it is described that endosperm cells in a maize kernel are made up primarily of large starch 
granules and protein sequestered in protein bodies. It is shown that a reduction in the number of 
protein bodies in endosperm cells derived from a transformant produced using a zein antisense 
construct was achieved. This was shown by light microscopy and the results given in FIG. 9. As 
can be seen in the figure, the results demonstrated a decrease in the amount of seed storage 
protein and therefore an increase in the relative starch content of the kernel. 

In conclusion, the Examiner has failed to set forth any basis for doubting the enablement 
of the claims. Appellants have fiirther affirmatively set forth evidence of the enablement of the 
claims that has not been contradicted by the Examiner. Reversal of the rejection is thus 
respectfiiUy requested. 
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IX. CONCLUSION 
It is respectfully submitted, in light of the above, that none of the appealed claims are 
properly rejected. Therefore, Appellants respectfully request that the Board reverse the pending 
grounds for rejection. 

Respectfully submitted. 




Reg. No. 42,628 

FULBRIGHT & JAWORSKI L.L.P Attorney for Appellants 

600 Congress Avenue, Suite 2400 
Austin, Texas 78701 
(512)536-3085 

Date: May 24, 2004 
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APPENDIX 1: APPEALED CLAIMS 



72. A fertile transgenic Zea mays plant having an increased starch content, the genome of 
which is stably augmented by a preselected DNA sequence encoding an RNA molecule 
which is substantially identical, or complementary, to an mRNA encoding a 19kD or a 
22kD a-zein plant seed storage protein, wherein the preselected DNA sequence is 
expressed in the cells of the transgenic plant in an amount sufficient to decrease the 
amount of said seed storage protein and increase starch content in the cells of a plant 
which only differ from the cells of said transgenic plant in that said preselected DNA 
sequence is absent, and wherein said preselected DNA sequence is transmitted through a 
complete normal sexual cycle of the transgenic plant to the next generation. 

73. A fertile transgenic Zea mays plant, the seeds of which have an increased starch 
extractability, the genome of said plant which is stably augmented by a preselected DNA 
sequence encoding an RNA molecule which is substantially identical, or complementary, 
to an mRNA encoding a 191cD or a 22kD a-zein plant seed storage protein, wherein the 
preselected DNA sequence is expressed in the seeds of the transgenic plant in an amount 
sufficient to decrease the amount of said seed storage protein and increase the starch 
extractability of the seed relative to the amoimt of said seed storage protein and starch 
extractability in the seeds of a plant which only differ from the seeds of said transgenic 
plant in that said preselected DNA sequence is absent, and wherein said preselected DNA 
sequence is transmitted through a complete normal sexual cycle of the transgenic plant to 
the next generation. 

78. A seed derived from the plant of claim 72 or 73, wherein the seed comprises said 
preselected DNA sequence. 

79, A progeny plant derived from the seed of claim 78, wherein the plant comprises said 
preselected DNA sequence. 
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The transgenic plant of claim 72 or 73, wherein the promoter comprises the 10 kD zein 
promoter. 



86. The transgenic plant of claim 72 or 73, wherein the promoter comprises the 27kD zein 
promoter. 

88. The transgenic plant of claim 72 or 73, wherein the preselected DNA sequence, which 
encodes an RNA molecule substantially complementary to all or a portion of an mRNA 
encoding a seed storage protein, encodes an RNA molecule substantially complementary 
to all or a portion of an mRNA encoding 19 kD a-zein protein. 

89. The transgenic plant of claim 72 or 73, wherein the preselected DNA sequence, which 
encodes an RNA molecule substantially complementary to all or a portion of an mRNA 
encoding a seed storage protein, encodes an RNA molecule substantially complementary 
to all or a portion of an mRNA encoding a 22 kD a-zein protein. 

90. The transgenic plant of claim 72 or 73, wherein the preselected DNA sequence, which 
encodes an RNA molecule, substantially identical to all or a portion of an mRNA 
encoding a seed storage protein, encodes an RNA molecule substantially identical to all 
or a portion of an mRNA encoding a 19 kD a-zein protein. 

91. The transgenic plant of claim 72 or 73, wherein the preselected DNA sequence, which 
encodes an RNA molecule substantially identical to all or a portion of an mRNA 
encoding a seed storage protein, encodes an RNA molecule substantially identical to all 
or a portion of an mRNA encoding a 22 kD a-zein protein. 

95. The transgenic plant of claim 72 or 73, wherein the cell is transformed by a method 
selected from the group consisting of electroporation, microinjection, microprojectile 
bombardment, and liposomal encapsulation. 
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96. The transgenic plant of claim 78 or 79, further comprising stably transforming the cells 
with at least one selectable marker gene. 

97. A method of producing a Zea mays seed with an increased starch content, comprising: 

(a) growing a transgenic Zea mays plant, the genome of which is augmented 
with a preselected DNA sequence encoding an RNA molecule which is 
substantially identical, or complementary to an mRNA encoding a 19kD 
or a 22kD a-zein seed storage protein, wherein the preselected DNA 
sequence is expressed in the cells of the Zea mays plant in an amount 
sufficient to decrease the amoimt of seed storage protein; and 

(b) selecting a seed fi-om the transgenic Zea mays plant, wherein the seed has 
an increased amount of starch relative to the amount of starch in a seed 
selected from a plant which does not comprise said preselected DNA 
sequence. 

98. A method of obtaining starch from a Zea mays seed, comprising: 

(a) growing a transgenic Zea mays plant, the genome of which is augmented 
with a preselected DNA sequence encoding an RNA molecule which is 
substantially identical, or complementary, to an mRNA encoding a 19kD 
or a 22kD a-zein seed storage protein, wherein the preselected DNA 
sequence is expressed in the cells of the Zea mays plant in an amoimt 
sufficient to decrease the amount of seed storage protein; 

(b) obtaining seed fi-om said plaint; and 

(c) extracting starch fi-om the seed. 

99. The method of claim 97 or 98 wherein the preselected DNA sequence is operably Imked 
to a promoter functional in plant cells. 

100. The method of claim 99 wherein the promoter comprises the 10 kD zein promoter. 

101. The method of claim 99 wherein the promoter comprises the 27 kD zein promoter. 
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102. The method of claim 97 or 98 wherein the preselected DNA sequence encodes an RNA 
molecule which is substantially identical to all or a portion of the mRNA encoding a seed 
storage protein. 

103. The method of claim 97 or 98 wherein the preselected DNA sequence encodes an RNA 
molecule which is substantially complementary to all or a portion of the mRNA encoding 
a seed storage protein. 

104. The method of claim 102 wherein the preselected DNA sequence encodes an RNA 
molecule substantially identical to all or a portion of mRNA encoding a 19 kD a-zein 
protein. 

105. The method of claim 102 wherein the preselected DNA sequence encodes an RNA 
molecule substantially identical to all or a portion of an mRNA encoding a 22 kD a-zein 
protein. 

106. The method of claim 103 wherein the preselected DNA sequence encodes an RNA 
molecule substantially complementary to all or a portion of an mRNA encoding a 19 kD 
a-zein protein. 

107. The method of claim 103 wherein the preselected DNA sequence encodes an RNA 
molecule substantially complementary to all or a portion of an mRNA encoding a 22 kD 
a-zein protein. 

108. The method of claim 97 or 98 wherein the genome of the transgenic Zea mays plant is 
further augmented with a DNA sequence encoding a polypeptide that provides the 
transgenic Zea mays plant with increased kernel hardness. 

109. The method of claim 97 or 98 wherein , the transgenic Zea mays plant is produced from 
cells transformed by a method selected from the group consisting of electroporation, 
microinjection, microprojectile bombardment, and liposomal encapsulation. 
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110. The method of claim 97 or 98 wherein the genome of the transgenic Zea mays plant is 
further augmented with at least one selectable marker gene. 
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Ji oomiMuriaon of the DNA and protein sequ^ices of a 
^roap of zein cDNA clones reveals that they share 
extensive sequence honudogy and probably originated 
fit>m a common ancestral gene« A comparison of clones 
coif^e^ttding to 22.000 polypeptides shows they 
i^92% homolotfens, while five clones eiHrregponding 
W^^M^ lBjbfA zeisiB Ilk hombloticy from 75 to 
05%« The ^ooes correspondting to the JIf, 22,000 pro- 
U^jlirp:eQr^% hoimdogmis to clones encoding the 
Jfr id,d06 proteins. A ^ne corresponding to the 
Ifr 15,000 zein has little homology to either the 
22,000 or 19,000 zeins. Clones corresponding to both 
the Mr 22,000 and 19,000 zelns have two putative 
polyadeaylation signals, SI nuclease mapping indi- 
cates that the Arst polyadenylation signal foUowing 
the stop codon is utilized by the M, 22,00O sequences, 
while primarily the second polyadenylation signal is 
utilized by the Mr 19,000 sequences* 



To stady the expression of zein genes daring maize endo- 
sperm development, we constructed and characterized a num- 
ber <^ full-length cDNA denes (Marks et oi., 1985). On the 
basis of doss-hybridization studies, these clones were divided 
into nine distinct groi^: one for the Mr 15,000 zein, five for 
the M, 19,000 zeins, and three for the M, 22,000 zeins. No 
dones for the 3f, 10,000 or reduced soluble protein were 
isolated However, the dones that were identified appear to 
r^iresent a large proportion of the zein mRNA sequences in 
the endosperm mRNA population (Marks et aL, 1985). 

Sequences of clones related to some of those in the groups 
we describe have been reported (Heidecker and Mesdng, 1983; 
Ceraghty et al^ 1982; Spena et oL, 1982). Previous studies 
revealed that genes encoding the M, 19,000 and 22fi00 zein 
groups are homx^ogous and probably have a omimon ancestral 
sequence (Marks and Larldns, 1982; Spena et aL, 1982), but 
the Mr 15,000 zein sequence was found to belong to an 
unielated gene family.' Genes within the M, 19,000 and 22,000 
zein groups have diverged firom one another throu^ base 
substitutions, small insertions/deletions, and relatively large 
internal duplications (Heidecker and Messing, 1983; Marks 
and Laridns, 1982). Many of these genes contain two func- 
tional polyadenylation signals (Mesnng et aL, 1983), and 
severid of the genes appear to have multiple tzanscrqitional 
start sites (Langride and Feiz. 1983). 

This study presents a comprehensive comparison of se- 

* This 18 Journal Paper 10«319 of the Purdue Agricultural Experi- 
ment Station. The costs of publication of this article were defrayed 
in part by the payment of page charges. This article must therefore 
be hereby marked "advertisement"' in accordance with 18 U.S.C. 
Section 1734 solely to inificate this fact. 

t To whom correqpoodeoce should be addressed. 

* K. Pedersen, P. Argos, S. V. L. Nayarana, and B. A. Larkins, 
manuscript in preparation. 



quences belonging to eight M, 19,000 and 22,000 zein groiq>s, 
and it presents the sequence of a full-length cDNA clone for 
a Mr 15,000 zdn that contains a very long 5' nonooding 
region. We have also analyzed the usage of ^e polyadenyla- 
tion signals for the Mr 19,000 zein genes. 

MAIIERIALS AND METHODS 

DNA re8tiictk>n endonudeases, Klenow fragment, T4 kinase, and 
81 nuclease were purchased from Bethesda Research Laboratories. 
It-'^IATP, [a-«PldATP, and ta-'»P]dCTP were purchased from 
New England Nuclear or Amersham Corp. NitroceUuloee was ob- 
tained from Sdileicher and SchuelL Calf intestinal alkaline phospha- 
tase was purchased from Boehringer Mannheim. 

5 '-Terminal End Labeling and DNA Sequencing— -Rtsttkiion frag- 
ments were prepared for sequencing by the radioactive end-labeling 
method of Mazam and Gilbert (1980) for 5' ends as described by 
Pedeisen et oL (ld82). DNA frttgznents were sequenced fay the method 
of Mazam and Gilbert (1980) with the modification of Krayev et at 
(1980). 

Computer Analysis — Sequences were analyzed using the Map and 
Gap programs devdoped by Deveieux et aL (1984). 

SI Nudetise Afcgiipin^— Fragments for Si nudease miq>ping were 
dther 5' end labeled as described by Mazam and Gilbert (1980) or 3' 
end labeled by filling in recessed 3' ends with Klenow fragment as 
described by Maniatis et aL (1982). End-labeled fragments weie 
cleaved by re^riction enzyme ^gestion and resolved on a 5% poly- 
aciylamide geL The firagments of interest were exdsed from the gel 
and isolated by the crosh-soak method of Mazam and Gilbert (1980). 

Si nudease mapping was performed fay the method of Berk and 
Sharp (1977) as described by Maniatis et aL (1982). One ftg of 
endosperm poly( A) RNA plus 200 of tRNA were added to 3 X 10* 
to 1 X lO^cpm of labeled fragment in 20 fd of 40 mM Pipes* (pH 6.4), 
1 mM EDTA (pH aO)^ 0:4 M NaO. and 80% formamide. Reactions 
were heated to 80 *C for 10 min and then incubated overnight at 
58 •C for Z15 fragments (% G + C = 55%) and at 52'C for 219 
fragments (% G + C » 48%). The hybridization reaction was stopped 
fay adding 0.38 ml of cold Si nudease buffer (0.28 M NaQ, 0.05 M 
NaOAc (pH 4.6), 4.5 mM ZnS04, and 20 f«/n)l denatured calf thymus 
DNA) containing 400 units of SI mideaae. The Si dige^km reactiona 
were incubated at 37 *C for 30 min afier which they were extracted 
with phenol/chloroform and the DNAs precipitated with etbanoL 
Samples were dissolved in sequencing dye (80% formamide, 0.5 x 
TNB (10 mM Tris-HO. pH 7.4, 15 mM NaCl, and 1 mM EDTA), 
0.1% bromphenol blue, and 0,1% xylene cyanol) and loaded onto a 
6% polyaczylamide sequencing gel that contained 50% urea. 

Northern Blot Analysis— F<3mit%otpo}y{A) RNA from membrane- 
bound polysomes of 18 day-after-poUination endosperms was resolved 
by de(±rc^horesis on an agarose gel that contained methyl meicuxy 
hy^nde (1.4% agarose, 5 mM methyl mercury hydroxide, 50 mM 
boric add, 5 mM sodium borate, 10 mM sodium sulfrtte, and 0.01 mM 
NaiEDTA) (Bail^ and Davidson, 1976). The gd was soaked in 0.5 
M ammonium acetate and then stained with ethidium bromide. Gel 
slices that contained the upper and lower zein mRNA bands were 
excised and recast in separate 1.4% agarose gels. The RNA was 
transferred to nitrocellulose filters CHiomas, 1980), and the nitrocd- 
lulose was prehybridized for 8 h at 42 *C in 50% formamide, 5 x SSC 
(1 x SSC = 0.15 M NaCl, 0.015 M sodium dtrate), 0.05 M sodium 
phosphate buffer. pH 6.8, 0.1% bovine serum albumin, 0.1% Kcoll, 

*The abbreviation used is: Pipes, 1,4-piperazinediethanesulfonic 
add 
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Ol1% po h r vu i y l |>yi ro Ii donet 200 icg/nd faeat-^knatuied sheared caif 
thymus DNA. The nitioceUuloee was then faiyfaimfized ovemigjit at 
42 *C m 50% fiannamide, 5 x SSC» a05 M sotfiam phosphate bafier» 
pH 6uS» 0.1% bovine serum aftwrniin, 0.1% FiooU, ai% po^yvioylpyr- 
rofidone, 200 |ig/ml sheared calf thymus DNA, 5% doztran sol&te, 
and 7 X 10^ cpm of the isolated zein cDNA inserts labeled with ^^P 
fay nick translation (Maniatis et dL, 1976). Following h^iMfizatkm 
the nitiocelhilose was washed twice in 1 x SSC containing 0.1% 
aodSum dodecyl sulfate at room temperature and twice in 1 X SSC 
containing^ 0.1% 80(finm dodecyl sulfate at 68 *C. Filters were air 
dried and antoradiographed. 

RESULTS 

By croea-hybridization analyas at a stringent criterion {Tm 
—15 *C), we were able to distingiitsh nine groiq>8 of zein cDNA 
cloDes (NfariES et aL, ld85). The DNA sequences of several 
members from three of these groups was previously reported 
(Marks and Larkins, 1982; Pedersen et aL, 1982).' In this 
8ta4y seven cDNA clones from the lemaining six groMpa were 
com|4etely sequenced by the method of Mazam and Gilbert 
(Plg-l). 

Anafym of Mr 22fi00 Zein Sequences— Each of the clones, 
cZ22A-l vod cZ22B-l» encodes a complete zein protein* 
whereas the third, cZ22C-2, lacks a few coding baaes on the 
5' end (Fig. 2). Both cZ22B-l and cZ22(>2 contain an entire 
3' noncoding re^on, and cZ22A-l contains a long 5' noncod- 
ing region. The c2^22B-l and cZ22C-2 sequences align without 
gap^ however, when these clones were conq>ared to cZ22A-l, 
four short gsps were introduced to maTiTn iri* homology. Hie 
largest gi^ results in a 9-base pair deletion in CZ22A-1 relative 
to cZ22B*l and cZ22C-2. Overall, the deletion/insertion 
events have left c222B*l and cZ22C-2 with three more amino 
acids than cZ22A-l. The clones differ from one another by 6- 
l%t and these nucleotide di£ference8 are fairly evenly dxstrib* 
uted throuc^out the 8e<pience8. The longest stretch of perfect 
homology occurs in the region of the stop codon and is only 
56 nucleotides in length. Forty-four per cent of the base 
changes have resulted in amino add substitutions. In general, 
the physical propertie$ of the substituted amiiK> acids are 
conserved. For example, seven of the 26 amino acid changes 
that occurred during the divergence of cZ22A-l and cZ22B-l 
invoked exchanges between valine azul alanine (F^ 3). A few 
substitutions have not been conservative. For example, at 
position 103 (Fig. 3) alanine is replaced with aspartic add. 




Fig* 1. Restriction enzyme map and aeqaendng strategy of 
zein cDN A dones. Only ejected reetrictioii ates are shown. Hori- 
zontad arrows inificate the strategy followed for the determination of 
DNA aequencea. Hie cDNA clones are: A, c222C2; B, c2l9A2; C, 
CZ19B1; CZ19C1; cZl9C2; F, cZ19Dl; and C. cZl5A3. Restriction 
enzymes: B, BaO; Bm, BomHI; BnU, BonVk C, Hinclk A Afel; 
HtfifUN, Ndel; and P. Psd. 



and at portions 177 and 227 c^utamic add is replaced with 
l^otamme and alanine, respectively. One other substitution 
resulted in a tryptophan residue at position 258. This amino 
add has been found only in zein sequences dosely related to 
the cZ22C groiq) (Spena et aL, 1982^ Gera^ty et aL, 1982). 

AnafysU o/ Mr 19,000 Zein Sequences— We identified five 
distinct groiqks of Mr 19,000 zein dones. Members of the four 
groups ZldA, Z19B, Zl9C, and Z19D did not cross-hybridize 
at a stringent criterion, whereas those of the fifth group, 
Z19AB, cross-hybridized to members from Z19A and Z19B. 
The representative member of the Z19AB group, gZ19AB-l, 
is a subdone of the genomic done ^Z99, whose sequence was 
previously reported (Pedersen et at, 1982). The remaining 
groups are r^resented by cDNA dones, and their sequences 
are coiiq>ared to that of gZl9AB-l (Fig. 4). 

Sequences from the grou^ Z19A, Z19B, and Z19AB are the 
most doedy related and show 94r-95% homology. As was the 
case with tl^ M, 22,000 zein dones, the nucleotide differences 
among these sequences show a fairfy random distribution 
(Fig. 4)* This is espedaUy true of the cZ19A-2 and cZ19B-l 
sequence^ the longest stretch of perfect homology between 
them is less than 50 nucleotides. However, both of these 
clones have regions of iq) to 70 nucleotides that are identical 
to gZ19AB*l. The differences in the distribution of nudeotide 
changes between done groups Z19A, Z19B, and Z19AB may 
e]q>lain their cross-hybridizatioh behavior. 

To con^Mire member? of the same group we determined the 
DNA sequences of cZl9C-l and cZ19C-2 (Pig. 4). These two 
eef^nces differ by on^ one audeotide in the 5' noncoding 
repon and by four bases over the first 219 cocHng nudeotides. 
From that point to the end of the shorter done, cZl9CV2, the 
sequences are identical. The four base differences lead to 
three conservative amino add differences (Fig. 5). 

To align the sequences of Z19A, Z19B, Z19AB, and Z19C 
groups, several gaps were introduced (Fig. 4). The largest is a 
24-ba8e pair deletion in the Z19A, Z19B, and Z19AB sequences 
(collectively referred to as Z19A/B) relative to the Z19C 
sequences. The insertion/deletion events that have taken 
phbce during the divergence of the genes corresponding to 
these dones have rendered the Z19C zeins with six more 
amino adds than the Z19A/B zeins. The Z19C sequences axe 
on the average 85% homologous to the Z19A/B groi^ At the 
3' end the average of 85% homology is maintained. However, 
in the 5' noncocfing region these sequences are only 55-60% 
homologous. 

To align cZ19D-l with the other sequeaces, additional gaps 
had to be introduced (Fig. 4). The longest is a 24-baae pair 
deletion in Z19D*1. This sequence has diverged more than 
the others and is only 75% homologous. 

Five of the six clones encode complete zein polypeptides 
(Fig. 6). The remaining clone, cZl9A-2, is misdng nucleotides 
for the first three codons. AH of the complete polypeptides 
contain a 21-amino acid signal p^tide and an intenial tan- 
demly repeated peptide of approximately 20 amino adds that 
has been previously described (Argos et at, 1982). The poly- 
peptides range in size from 233 to 240 amino adds. As in the 
case of the M^ 22,000 zein dones, approximately 45% of the 
nudeotide differences among these dones result in amino 
acid substitutions. Most of the resulting changes are conserv- 
ative; however, at a few positions neuU^ amino acids have 
been replaced with charged, and vice versa. 

Northern Bht Anafysia — Previous analyses revealed tiiat 
the mRNAs encoding the Af, 22,000 zeins can be resolved 
from the mRNAs for the M, 15,000 and 19.000 zeins by gel 
electrophoresis under denaturing conditions (Wienand and 
Feix, 1978; Larkins et oL, 1983). However, the mRNAs for the 
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Met 

•t«ts 

^.ogifc^ CRmQIOmOCA«AISO0GtIC«CITmOC»CMT0CT^ 

ZHSomi t C C A T T ^ ♦IS? 

CC C T T A A T «219 

CZ2SC2 CCTT A AT AA 

^loaft cMCMCMtOCmCCAOamAOCCXACMAOCATXOCAAOaCMClUlCMCMC^ 

ccaai AA c 0*^ CAO ♦309 

MOt^ A C C C A O A AT 

CS22A1 AXG OTQ AAC OCT OIC AOC TAC TTO CU CWC*QCTOCTrOCATOCAACCCACTT<KrCT0O0CAACOI*OCT0C^ 

<C22Bi O A A T A «399 

taZKZ GO A A 

CC22A1 CM CIQ CAA CM nr AXQ OCA 010 CTC AOr CAA CIA OOe AlO 010 AAC OCT OCX OXC 

CtZaU A C C T C CM CAC CAA G 

tassa AT 0 c c c caacaocaa o 

CX22A1 coo CRT 0C3Q G9Q OOC AJKT OCA OCT AOOVkCCTACMCAACMTXOCTOOUC/U^ATTOTAOCAOCTCSGACrCM CtA OCT (TTC OCA AAC 

c£2SBl TO AGOATO A ♦TT? 

ct22C2 T T T AO A 0 

cffiai ocrocroocTMCTi!ACMCMTiom€CATTCAACCAAcioocrorQTCAAACTcrocroccTiu;c^ CAO TTA en 

cZZaX T C C AA A 0 T 

toad 0 c AO 

cC22ftl Atf OC» TIG OCA 010 OCf AAC OCA TTO OfC OCT AOC TTC CIO CM CAB ClU CAA CAA TTO CIO OCA T^ 

c£e2B& O A m 0 T 4759 

iesaecr ca a ga oo a * a g 

-COOH 

<gZ2Al OCTQOCTTGVMMttCMaUOOCAICOTTOaAOOTaCCATCTTTTM ATTAC»IATC»CAICl»CTaUCAICTOOnaOOCrC»TA:cnd 

«»a2M. T TOO ADO T OOOQC AWfUl T HXTA ^DfiS 

^2202 too TOO T CC J Q CAmtUl 1 lUUIA 

<£22U Q AAWAAtC MmttOTWOtCto 

tf22C2 OI>> m M C AAIAriqTCWTGAC A ' lVtAlVI O(^ 

Fig. 2. Compaxison of the nucleotide sequence of clones for the 22,000 zeins. Nucleotides are 
numbered storting with the first base of the initiation codon and are indicated on the rigfU-hand margin. The 
sequences of cZ22Al and cZ22Bl (previously named ]>Z22.I and pZ22^ respectively) were previously reported 
(Marks and T^rkins, 1982). The complete nucleotide sequence of cZ22Al is given, but only variable nucleotides at 
corresponding positions are listed tar the other clones. Asterisks indicate positions where we have Introduced gaps 
in the sequer»ce to tnmmrre homolagy. The portions of the first nucleotides for c£22B'l and cZ22C-2 follow the 
colons. An indicates the position at which a sequence terminates in a po]y( A) tail. Positions corresponding to the 
imtiating methtonine (Met) and the NHt terminus and (XX)H terminus of the encoded polypeptide are indicated. 



CZ22Alma MCncllAUA UALLVSAIV AHtPQCSU PSASIFQFLP PVTSNGrCHP AVQATRLQLA LAASAtjQQPI AQUQQ^SLAH LTU^mTOQ QQWlfSLS 
r7??ftl— srAS St A QIV I«A100 

CZ22C2M Start: r t •LQVH I»A 

eZ22AUa VLMCWm LqqflMASHP LALAIIVAAtq ^qgOMOTMP VtSQLAMVKP AVYL— qU. SSSPLAVCNA miiQWX^ QIVPALTQLA VAKPAAnqQ 
<=ggM*^ A Til UA AQQQ AE ^200 

^cz2C2m q onr A la a qqq 

-COOH 

cZeZAUa tXPrnqUVS HSAAXtqQRQ qUHFUVAH PLVATTtim QQUFXHOFS LI«PAL>*qQ PXVGCAXF 

^ZaWa* tW A • . H VSR 268 

ctzaczM T S P A • s sv 

Fig. a. Comparison of amino acid sequences deduced from DNA seqaences of clones for the M, 
22,000 seins (Fig. 2). The complete sequence of cZ22Al is given in the standard ^ngle letter amino acid code, 
but only variable amino acid residues at corresponding positions are listed for the other sequences. The position 
corresponding to the NHx terminus of the mature protein is indicated. Numbering of amino adds begins with the 
first methionine in the signal peptide. Amino acids are divided into groups of lO and include gapped positions. 



16454 



Nucleotide Sequence ofZein mRNAs 



CZ19M 
cZl«CZ 



cZlfnt 



ACATAClCAM rT A r A C TACCAACATCTTACCAC^ ATC CCA OCC AAC ATT TIT OCX CTC CTT OOC CTC CTT €Ct ♦3$ 

>c > wr >> r a T > r a ii r Toouacc>c t ft fy »< >ri ift T a r * rr >a r i atc ox acc aac ata ttt toe crc an atc etc crt ocr 

C C A A 

« C € T 

CTT TCr OCA ACT GCT OCT AOC CCC AOC AIT TTC CCC CAA TOC TCA CAAOCtCCTAtACCTTCCCTtCTTCOCCCATACCTCTCA CCA OCC 
Cn TCA CCA'AAC CTT OCT AOC OOC ACT ATT ATT OCACftATOCTCACAA************ *«* m MmciUktACCICTCTOOCCIC ♦IM 
CTT TCT ACA TCt CTT OCT AAC OOC ACA ATT TTC OCT CAA TOC TCA CAA OCT OCT ATA OCT TOC CTT CTT COC OCA TAC CTT CCA TCA ATT 



cXlfA2 e O C A 

cxim C C C C A T 

gKl«ASl Ate TCT TCA CtA TCT CAA AAT CCA ATT CTT CTA OOC TAC AOC ATC CAACACOCAATCCCAOCAOOCATCTTAOCTTTATCACOC TTC TTC 

cZt9M ACA OOC OOC ACA TTT CAA TAC OCA ACT ATA CAA TCC TAC AOC CTA CAA CAC OOC ATC OCA CCA AOC ATC TTA COC TOC TTA OCA ttC ACT 

<Zt9CX ATA OCT TCA ATA TCT CAA AAC CCA OCT CTT CAA CCA TAT AOC CTT CAA CAA OCA ATC OCA OCA AOC AAC ATA OCT TTA TOO COC TTC TTO 
eZ19Ct C G C 



*2lf 



CU9A2 

csrni 

Stt9ABi 
cZlfCl 



CO T t 

CTC CAA CAA TCA TCA OOC CtA TTA CAACACTTACCTTTCCTCCATTTCTTCOCA CAA AAC ATC AOC OCA CAA CAA CTA CAA CAA CTC CTC 
etc CAA CAA CCA TAT CCC CTA TTC CAA CAA OCA TCC TTA CTC AAT CTA TAT CTC CAA ACA ATC CTA OCA CAA CAA CTA CAA CAA CAA TTC 
TTTCAACAATCG0CA0CCCTAM*^M*«*TCTTTCCTC CAC TCATTCCTACAAACCATCAOCOCACAACACCTCCACCAACTCCTO 



♦30* 



cZt«A2 

CU«AB1 
cZi901 
CZ19C2 
cXl9Cl 



C A T T C 

C T C T 

CTA*^*«*«*«*«****^*«0CAAACCTTOCT0CCTACTCTCACCAACAACACTTTCtCCCATTCAACCAACTA0CTOCATO 
CTT CCA ACA ATC AAt CAA CTA CTT CCAOCCAACCTTCATOCTTACCTCCACCAACAACAATTTCTTCCATTCAATCAACTACCTCCCCTC 
CTA OCT CTC ATC AAC CAA CZA CCT CTG OCA AAC CTT TCT CCC TAC TCT CAC CAA CAA CAA TTT CTT CCA TCC AAC CAA CTC TCT ACA CTC 



♦3W 



cZlfA2 
CZ19B1 
(ttHABl 

CZ19C2 



*** *«* A A 

T A *^ A C A 

AAC TCT OCT CCT TAT TTC CAC CAA CAA CAA CTA CTA CCA TTC ACC CAC CTA CCT CCT CCC TAC CCC OOC w CAA TTT 

AAC CCT OCT OCT TAC TTC CAC OCA CAA CAG CTA CTA CCA TTC AAC CAA CTT CTC AGG ACC CCT OCT CCC TTC TTA CTC CAG CAA CAC TTC 
AAC0CTCCT0CTTATTT6CAG***CAACAA CTA TTA CCA TTT ACC CAC CTA CCT ACT CCC TAC TCT CAC CAA CAA CAA CTT 



€Zt9A2 

czini 

iXl«ABl 
e2l9Dt 

cZt4C2 
eZ19Cl 



T CC T C C C A 

CTG C 
CTT CCA TTC AAC CAA CTG CCA OCA TTC AAC TCT CAT OCT TAT CTA CAA CAA CAA CAA CTA CTA CCA TTC AOC CAC CTA OCT CCT CTC AOC 
TTC CCA TTC CAT CTA CAA CTT CTC OCA AAC ATT OCT CCT TTC TTC CAA CAA CAA CAA TTC CTC CCA TTT TAC CCA CAC CTT CTG OCA AAC 
CTT CCA TTT AAC CAA TTC OCC CCA CTG AAC CCC CCT CCT TAT TTC CAC CAC CAA ATA CTA CTA CCA TTT ACC CAC CTA OCT CCA CCA AAC 



eZ19A2 

cZT98t 

tZl9ABt 

CZ&901 

eU9CT 

CZ19C1 



A C C C 

A AC A CT C 

CCT OCT CCC TTC TTC ACA CAC CAA CAC TTC TTC CCC TTC TAC CTC CAC ACT CCC CCT AAC CTT CCC ACC CTC TTA CAA CTC CAA CAA TTC 
ATT AAC OCC TTC TTC CAA CAC CAA CAC TTC CTC CCA TTC TAC CCA CAC CAT CTC CCA AAC AAT CTC OCC TTC TTA CAA CAA CAA CAA TTC 
CCT CCT TCC TTC TTC ACA CAC CAA CAC TTC CTC CCT TTC TAC CAC CAC TTT OCC OCT AAC CCC CCA ACC CTC TTA CAA CTA CAA CAA TTC 



cn«A2 

c»9U 
CUUAI 

. cti«n 

cZtM 
cSlfCl 



A T 

CtO CCA TTC CAC CAA CTT CCT TTC ACA AAC CCA OCA OOC TTC TAC CAA CAA CCC ATC ATT CCT OCT CCC CTC TTT TAC 
CTC OCA TTT AOC CAA CTT OCT TTC ACC AAT CCT ACC ACC TTA TTC CAC CAC CCC ACC ATT OCT OCT OCC ATC TTC TAC 
TIC OOC TTT CTC CAA CTT OCT TIC ACA CAC CCA OOC OOC TOC TAC CAA CAA CACATCATTOCrOCtOCCCtCTTTTAC 



«H7 



cUIIAT 6 CAC Aa3co4 

cSam T AT C Aax«»d 

gUfAU ^■>«n<^«^ir*rTr*--A^T» ^^*.i. ^ .y y Y ■ ■ 1 1 11 m.lLA T AI l l l-mjH H i A r MA T AACA AAC T A r A T TTCTACATTCTT AICILCHCI ACtCTOCAA. , . -^ASA 

cZttM ATTTTTIAlOCTTTATACtGIAA TAATAAAC TTCTCATACtCATATCtOCAACTTCTCACT AATAAA ACATTAiCACATCTATATTTTATTAiew^ 

cXlfCT ATTOCHATtACnCTAATTCAA TAAIAAA C l I H i IUJ «CATCIATCTOCOCA*Ci r AC * AA T AAC AACTTACATTTCCAot«w< 

cXt9Ct ACATTCTAATCTCATATTCCT 

Fig. 4. Comparimi of the nudeotide seqaenoe of clones for the Mr 19,000 seins. The nudeotide 
sequence of gZl9ABl (previously named ZG99) waa pieviously reported <Pedersen et at, 1982). The complete 
sequences of gZ19ABl, cZ19Dl, and cZl9C2 are given. The sequences of the cZ19A2 and cZ19Bl follow the coAmt, 
and only nucleotides in the sequences that vaty from gZl9ABl are shown. Likewise, the cZl9Cl sequence begins 
after the colon and only nucleotides in that sequence which vaty from cZl9C2 are shown. An indicates a sequence 
terminating in poly(A) tail. Asterisks indicate positions where we have introduced gaps in the sequences to 
maTimitae homology. The putative polyadenylation ^gnels are underiined. Positions cotresponding to the initiating 
methionine (Met) and the NHt terminus and C(X)H terminus of the encoded polypeptide are indicated. 
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ctl9A2 STMIT : Ft T T V <J P 

CZX9B1 L T S V Q 

cnSABl WUUCXraiM ULOlSA^Aftt ASin>^CS«^ m^Xmt SPAMSSVCER PILLFYRIQQ UAACtLPLS PLFLQQSSAL LQQLPLVHLL AQ^riRAQQU) 
C219D1 MAAKXPALLA UALSANVAT ATIIPQirS^ m*****^!!. SFVTAABFSY FTIQSTRLQE AIAASILRSL ALTVQt^AL tQQPSLVia,T LqptVAqgM 100 
C219CI HATKITSLtH UAtSACVAN ATIFFQCSQA PIASIXPPTL PSKIASVCEII PALQPTFLQQ AIAASWIPLS PLLFQQSPAL ••♦•SLVQSL vgrTIRAQQ^ 
CZI9C2 T II 



CZX9A2 GH ••Q Q ALPD 

c219ai GS S*Q PQ Pl» CT 

CZ19A&I qLVL**^* •*AIILAATSQ QQQrtPniqL AALHSAATU) QQq^XPFSQL •••AAATPft* NftPniQLA AUCSHATVQQ QQCXFTSQU AVSPAAmQ 

cCL9ia QQIXraHQV VAAMUkAYLQ QOOFLPFMOL ACVMPAATLQ AQQLLPrWQL VRS^AAFLU} QqLLPFBLQV VANIAAFtQQ WXPFTFQV VGNIHAFLqq 200 

eU9CL qtVLPLHIQ? ALAHLSPTSq QQqrLFniQL STiaPAATLQ •QI^XrFSQL •••ATATSOQ gQLLFmOU AUfPAAILQQ QILLFPSQLA AAMRASFLTQ 



CZ19A2 

CZ19ABI 

CSI9D1 

CZ19C1 



AAV II T 

P Q V A II V 

QQLLPFoar AFWcnijqL qqwDqu uthpaafyoq piicgalp 

QQUmpq^ VANKVAFLQQ QQUPPSqMt LntPTnXQQ ftiocaip 
<3f9ZmW AAl^AILLQjL QVXPPVq^ LltPAASYQQ ailOGALF 

Fig. 5. Comparison of amino acid sequences deduced from the DN A sequence of clones for the Mr 
19,000 zetns (Fitf. 4). The coa4>lete amino add sequences deduced from gZl9ABl. cZ19Dl, and cZ19Cl ace 
given. Only amino acid residues for cZl9A2 and cZlSBl that differ from £Zl9ABl are shown as are only amino 
add residues for cZ19C2 that differ from the sequence of cZ19Cl. Asterisks indicate gaps in the amino add 
sequence. The positions of the NHs terminus and COOH terminus of the mature proteins are indicated. The 
number of amino acids begins with the first methionine in the signal peptide. Amino adds are divided into groups 
of 10 and indude gapped positions. 
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B 

t ■ T 
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Fig. 6. Northern blot analysis of zein mRNAs. A, 18 DAP 
endosperm poly{A) RNA from membrane-bound polyribosomes was 
resolved on 1.4% agarose gels that contained 5 mM methyl mercury. 
RNA from get slices that contained the top (T) and bottom (B) sein 
mRNA bands was transferred to nitrocellulose and hybridized to 
nick-translated zdn cDNA inserts as described under **Materials and 
Methods.** The cDNA probes were: fi, eZl5A3: C. cZlSCl; D, cZ22Bl; 
and E, gZ19ABL 

Mr 15»000 and 19,000 zeins have not been resolved. We used 
a modified Northern blot analysis to determine if our clone 
groups correspond to dbtinct mRNA populations encoding 
the Mr 22,000, 15,000. or 19,000 zeins. 

Poly(A) rRNA was isolated &om membrane-bound poly- 
somes of maize endosperm and resolved on a methyl mercury 
agarose gel (F^g. The upper and lower bands corre^nd- 
ing to zein mRNAs were excised, blotted onto separate nitro- 
cellulose niters, and hybridized to representative clones. 
Probes representing the Z15 and Z19C groups hyforidlzed only 
to the lower band (Fig. 6, B and O. Tlie cZ22B-l probe 
hybridized only to the upper band under conditions that 
allowed cross-hybridization between the three closely related 
Mr 22,000 zein groups (Fig. 6D). These results indicate that 
the mRNA pc^mladons corresponding to the re^;>ective clone 
groi^ are homogenous and probably encode only a single 
size class of protein. The 19AB probe hybridized to both the 
upper and lower bands (Fig. 6£), a result which is consistent 
with the existence of two subsets of mRNAs corre^nding to 
this group (Hu et aL^ 1082; Heidecker and Messing, 1083). 
Presumably, one group of mRNAs encodes M, 19,000 zeins 
and the other encodes M, 22,000 zeins. 

Analysis of a Mr 15.000 Zein cDNA Clone— V/e recently 
compared the sequence of a cDNA clone and a genomic clone 
encoding a A/, 15,000 zein.' The cDNA clone lacks a portion 
of the 5' coding r^on but contains the* entire 3' noncoding 
sequence of its template mRNA. Except for one nucleotide 



difference, the cDNA sequence is identical to the genomic 
sequence for its entire length. 

We report here the sequence of an additional cDNA clone 
that encodes a complete Mr 15,000 zein polypeptide (Fig. 7). 
This sequence differs from the genomic clone' by a few 
nucleotides. Since we estimated that there may only be tiRO 
genes for the Af, 15,000 zeins (Wilson and Larkins, 1984), 
these two sequences may represent them. However, some of : 
the base differences between the clones may be due to se- 
quencing artifacts. Because of the high G -h C content, many 
regions of the sequencing g^ls vrere compressed, making the 
sequence determination extremely difficult. Essentially all of 
the base differences were in these regions. 

Previous Sl-mapping data suggested that the 5' end of the 
mRNA for the Mr 15,000 gene began 65-70 nucleotides before 
the start codon (see Fig. 7). However, this larger cDNA clone 
has over 150 noncoding nucleotides at the 5' end that align 
perfectly with the 5' noncoding sequence of the genomic clone. 
The existence of this cDNA clone indicates that transcription 
may originate &om two regions of the gene. To investigate 
this, we performed an Sl-mapping analysis. A ifthdIIIrDdel 
fragment that contains the 5' end of the mRNA sequence was 
hybridized to poly(A) RNA isolated from staged endo^nns 
(Fig. M) and the hybrids digested with SI nuclease. The 
majority of protected fragments ended in the same re^n 
previously shown to be the 5' end of the mRNA. However, 
upon longer exposure of the gel we detected some transcripts 
that protected the complete noncoding region of the longer 
cDNA clone (Fig. 8B). While these results are consistent with 
the possibility of a major and minor origin of transcription, 
they do not exclude processing of a larger transcript for this 
gene. The longer transcript appears to be more abundant at 
early stages of endosperm development, as is indicated by the 
greater protection of the larger fragment with poly(A) RNA 
from 12 DAP endosperms than from later stages (Fig. 86). 
Further analysis of the transcripdonal activity of the Af, 
15,000 gene will be presented elsewhere.* 

The last 23 nucleotides of the 3' end of the cZ15A3 were 
not homologous to the genomic clone. The terminal 28 nu- 
cleotides were found to be an inverted copy of the sequence 

' R S. Boston and B. A Larkins, manuscript in preparation. 
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CZ19A3 MICaOGIUTATCmCQUXtCUOCAJW^^ . 39 

CZ15A3 OCO(X»GTOC(niU;ACACCATCGTOCAAOCAGAi^^ MCAAGMCCTCATCCTrCTCCIC GTC TOG CTC OCT CIC TCA OCT GOC AGC COC TCT OCA ^ ^ 

MKNVIYLTV VL ALSAASA3A 

CZ15A3 ATGCAGAXCCOCTGCOCCmCCCOCGCrGCACOGCTTOTACaXGaGOCCCCO^ «1$0 
MQHPCPCACLQCtTGAGAGLTTHMGAGGLf 

eZI5A3 CCCTACGCCGACtACCTCAOGCAGCCGCACTCCACCCCGCTCCCGaS «2jkO 
PTA BTLBQPQCSPLAAAPrTACCCQTSANT 

cn^3 CAC OCC CTC OCC CAA CAG T(X OO CAO CM ATQ ACQ ATG ATC 43)0 
QPtRQQCCQQQllRMMOVQSVAQQLqNNMQl 

CZ15A3 CACCGTCCCOCrACCGOCACCACCACCCTClACGACCCAGCrCTCATCCACCAGCAGCACCACC^ A20 
XRAATASSStYBPALNQQQQQL LAAQGtMP 

cn!^3 ATC GOC ATG ATC Ate 00c CAG AAC ATO CCO CCC ATG OCT OCA CTC TAC GAG TAG CAG TAC CAG CTG OCC AGO TAC CCC ACC AAC COC TOT ♦SIO 
MAMMMAQNMPANCGLYQYQYQLPSTBTIIPC 

-00G« 

e215A3 CCC CTC TCC OCT OCC ATT CCO CCC TAC TAC tCA TrCATCATATrrOCGAAATCTarTCTAtOCATCCCTCfCrATCTATATATOTAAllUTCCACT^^ *6lT 
CVSAAXPPY T* 



ca5A3 OTraCnereiSfATCASXXtAATATATCCATCATi^^ 

Pig. 7. Nticleotide sequence of a cDNA clone encoding a JIf, 15,000 zein protein. The predicted amino 
acid sequence ia shown bekm the nucleotide sequence. XXX imficates the m^r transcriptional start site, and the 
undencored sequence corresponds to the putative polyadenylation signaL The overscored region indicates the 
position of an inverted nontandem repeated sequence. The hfH,-tertninal and COOH-terminal positions of the 
mature polypeptide are indicated. 
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Fic. a SI mapping of the5' region of cZ15A3.A,apartial restriction enzyme mapoftbeS'end of cZ15A3 
and a region of the clone used in the Si-mapping experiment Gu indicates a G tract of 14 bases that was added 
during the cloning procedure. The restriction enzyme sites upstream from the G tract are in the poly linker region 
of the pla.'tmid vector pUC8. The probe was generated by end labeling a i>dcl fragment with subsequent cleavage 
at the Hindlll site in the poly linker. The vertical arrows indicate the limits of Si (figestion as shown in B, B, 
Maxam and Gilbert sequencing reactions of the probe are shown in feme / (G reaction), lane 2 (G + A reaction), 
tone 3 (C + T reaction), and lane 4 (C reaction). The pn^ was hybridized to 1 pg of poly(A) RNA from 12 DAP 
{lane 5). 18 DAP {lane 6), 22 DAP {lane 7), and 28 DAP (ione 8) endosperms as described under **Materials and 
Methods.^ After the hybridization the probe was digested with Si nuclease and resolved on a sequencing gcL The 
sequence in the region of the nuclease-protected bands and the location of the bands (or/oux) are given on the left- 
hand margin. Restriction enzyme key: D. /)del; ff, //tndlll; /V, Mie I; and P, PstL 
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Fig. 9. Mapping of the 3' re^n of cZ16A3. A, a partial reatricUon enzyme map of the 3' -terminal of 
cZl5A3 and the region <^ the clone used in the SI -mapping experiment The Cu rq>resent8 a C tract of 13 bases 
that was added during the doning piocedute (Marks et aL, Id^). The restriction enzyme sites downstream fiom 
the C tract are in the poly linker region of the plasmid vector, pUCS. The probe was generated by cleavage with 
//mfl, and it was 3' end labeled by using Klenow fragment The labeled fragment was recut with HincU. The 
horizontal arrows show the position of inverted repeats (see Fig. 6), and the vertical arrows indicate the limit of Si 
dig^on as shown in B. fi, Maxam and Gilbert sequencing reactions of the probe are shown in the firtt two lanes 
as Indicated. The probe was hj^rufized to 1 fcg of 18 DAP endosperm poty(A) RNA plus 200 /eg tRMA (+RNA) or 
200 Mg of tRNA without poly(A) RNA (-RNA) as described under "Materials and Methods." After hybridization 
the probe was digested with SI nuclease and resolved on a sequencing get Tht sequence In the region of the 
nuclease protected bands and the location of the bands {arrows) are given at the ieft-hand margin. Restriction 
enzymes: B. Balk C, Hincll; H, Hina; and P, Psth 



Fig. la Determination of the 3' 
terminus of mRNAs closely related 
to the Z19A/B clones. A» a partial re- 
striction enzyme map of Uie 3' coding 
and noncoding sequence of the clone 
gZ19ABl (also called ZG99, Federsen et 
at, 1982) and the region of the clone 
used in a Si -mapping experiment The 
probe was made by cleavage with Bonl 
followed by 3' end labeling with Klenow 
fragment and a second restriction diges- 
tion with ifmcll. The vertical arrows 
indicate the limits of Si digestion as 
shown in 6. B, Maxam and Gilbert se- 
quencing reactions of the probe are 
shown in lane / (G + A reaction) and 
lane 2 (C + T reaction). The probe was 
hybridized to 1 /£g of 18 DAP endo^rm 
poly(A) RNA plus 200 /tg of tRNA {lane 
3) and to 200 p« of tRNA ak>ne {lane 4) 
as described under '"Materials and Meth- 
ods.^ After hybridization the probe was 
digested with Si nuclease and resolved 
on a squencing gel. The sequences in the 
rej^n of the nuclease*piotected bands 
and the location of the bands (orrotcs) 
are given on the left-hand margin of 
Restriction enzymes: Bnl^ Banl and if, 
HinciL 



But Bnl 



TAG 



3- 



-5' 



SOI»p 




ua 



found between nucleotides 614 and 642 (Fig. 7). To determine 
if the 3' end of the cDNA is authentic, we used a ffinfl to 
Hincll fragment 3' end labeled at the ffinfl site to probe 
endosperm poly(A) RNA (Fig. 9A). The only protected frag- 
ments extended a few bases into the tenninal inverted repeat 
(Fig. 98). Thus, this inverted repeat is apparently a cloning 
artifact and is not due to sequence heterogeneity. 

Analysis of Pofyadenylation— Many of the M, 19,000 zein 
genes were found to contain two polyadenylation signals 
(Messing et aL, 1983). To quantitate the relative utilization 



of these signals in the genes belon^ng to the Z19A/B groups 
(Fig. 4) we did an Sl-ma|^ing experiment A Banl to HincVL 
fragment from the genomic clone 19AB-1 was 3' end labeled 
on the Bonl site (Fig. IQA) and hybridized to endospenn 
poly(A) RNA (Fig. lOB). The probe should be protected by 
mRNA corresponding to the Z19A/B groups since these se* 
quences differ by only a few nucleotides in this lepotL Two 
major fragments that differ in size by seven nucleotides were 
protected (Fig. lOB). Both of these fragments extend past the 
second polyaden^tion signal. Furthermore, the shorter and 
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longer £ragments end piecisdy at the last nucleotides found 
in GZ19A-2 and cZl9&-l, Tespectiv^ (Hg. 4). Server^ addi- 
tional bands, fainter in intendty, oonespond to fragments 
ending three bases after the first poltyadenylstion signaL Since 
addition of the poly(A) tail nonnally occurs within 10-^30 
nucleotides of the polyadenylation sequence (Nevina, 1983), 
these may not represent authentic 3' ends of the transcripts. 
Ihey may instead represent a re|^n of instability due to a 
long tract of Ts in this area. A few faint bands corresponding 
to mRNAs terminating 10-40 nucleotides beyond the first 
polyaden^ation signal also appesred. 

Heidecker and Messing (ld83) have sequenced two cDNAs 
doeely related to 19A/B groups that terminate after the first 
po^yadenylaUon signaL Thus, the first mte is probably used, 
but at a much lower frequency than the second. Similar results 
were obtained with probes from the 19C and 19D groups (data 
not shown). 

DISCUSSION 

To study the heterogeneity among the zein mRNAs in the 
inbred W64A, we characterized the structure of a number of 
zeincDNA clones. Each of the three groups of Mr 22,000 zein 
sequences is 60-70% homologous to the Afr 19,000 tern se- 
quences, ^ttpienots iamong the Mt 22,000 grovqts shcy^red 92% 
hoi^fi^* 1^ sekiuenc^ gtdafps hkve 

MS^to^QT rarigirig from 75 to 95%. The relationships among 
slU th^ zein sequences are illustrated in Fig. 11, which 
suggests the order of di^lications that gave rise to these 
groups. 

HnetoL (1982) characterized a genomic done, Z4» closely 
related to our Z19A/B groups that contains a large internal 
duplication. The duplication adds 30 amino acids to the 
encoded polypeptide and makes it the size oiaMt 22,000 zein. 
By Northern blot analysis we found that in the W64A inbred 
this type of duplication appears to be restricted to genes 
within the Z19A/B group (Fig. 6). 

The duplication event suggests that the M, 22,000 zdns 
may have ori^^ted &om a M, 19,000 gene» and close exam- 
ination of the Mf 22,000 sequences provides evidence for this. 
The sequence from nucleotides 264-396 is 85% homologous 
to the sequence fii>m nucleotides 397-540. The dvqplication 
that formed this region would have added i^prozimately 40 
amino adds to the encoded polypeptide. A series of base 
substitutions would account for the subsequent diverdon of 
the genes into two distinct gene families. 




FlC 11. Comparison of sequence homology among xein 
mRNAs. The phytogeny was reconstnicCed with the method de- 
scribed by Fitch and Nfargoliash (1967) from the approxunate per 
cent of nucleotide substitutions that have taken place among the zein 
gtnes. The percentages listed arc the approximate per cent of nucleo- 
tide substitution between putative gene hranchtng pcnnts. 



We have also sequenced a cDNA cbne for the M, 15,000 
zdn which Is ioeai^ identical to that obtained tot a genomic 
done encoding this protdn.^ The sequences of the Mr 15,000 
zeins show no homology to those for either the Af, 19,000 or 
22,000 zeins, indicating that they comprise a separate family 
of g^ies. The Mt 15,000 zdns have a much higher G + C 
omtenty lack an internal repeating nucleotide sequence, and 
are present in onl^^ one or two copies in the genome (lAniaon 
and Trftrkins, l^). According to Si nuking, the mRNA 
enco&^ the M, 15,000 um primarily begins 65-70 nudeo- 
tides before the start oodon (Fig. 8). The sequence of cZl5A3 
differs from this fay having a 5' noncoding sequence of over 
150 ntideotidea. Based on Si mapping we estimate that the 
longer mRNA is 100-1000-fold less abundant than the m^ 
mRNA q>ede8 (Fig. S). Typical TATA* boxes (^stratiadds 
et oL, 1980) are found in the genomic clone just upstream 
fmn the start of both mRNA species suggesting that both 
start sites may be used But it is also possible that the larger 
transcript is a precursor of the smaller one. The existence of 
zein genes with two transcrqytional promoters is not without 
precedence. Langride and Feix (ld83) analyzed a gene that 
encodes alf, 22,000 zein that appears to have two promoters. 
However, Uiese promoters are s^Mrated by over 1000 nudeo- 
tides, aiid they have neady equal levels d transcriptional 
activity. 

Of the five zein cDNA clones that contained poly(A) tails 
only three had the consensus sequence "AATAAA" which is 
found on the 3' terminus of mai^ animal genes (Kevins, 
ld83). This sequence was located 50-100 nucleotides before 
the start of the pc^A) tail, whereas in animal genes the 
polyacbnylation signal is generally 10-30 nudeoddes up- 
stream tem the poly(A) taiL Messing eiaL (1983) identified 
several putative variants of the polyadraaylation signal in zdn 
cDNA sequences that were located doser to the po]y( A) taiL 
In the A20 and A30 clones, which are dosely related to our 
Zl9C and C19A/B groups, respective^, they found the se- 
quence "AATAAG.** We also found this sequence on the 3' 
terminus of the Z19C and Z19A/B dones (Fig. 4). In their 
B49 sequence (dosely related to the Z22C group) another 
putative variant, ^AATAAT," was fotmd (Giera^ty et al, 
1982). This sequence is also found on the 3' terminus of ouic 
Z22B and Z22C dones (Fig, 2). 

In addition to the variation in the sequence of the pdty- 
adenylation signal, some plant genes also differ from aiumal 
genes by having two potential polyadenylation sites (Mesung 
et al, 1983). Hu et aL (1982) r^rted the sequence of a 
genomic done, Z4, that has a "AATAAA" sequence and a 
*'AATAAG" sequence 26 and 65 bases, respectively, down- 
stream from the stop codon. They also isolated cDKA dcmea 
that terminated in a poly (A) tail after each of these sequences 
(Heidecker and Messing, 1983). These denes are closely re- 
lated to our cZ19A2 and cZl9Bl clones which end in a po|y(A) 
tail after the second putative polyadenylatk>n signaL 

Our resuhs imficated that most of the* zein mRNA that 
corre^nds to the Z 19 A/B clones tmunated after the secoiwl 
potyadenyiation sequence. This iqppears to be in contrast to 
mRNAs closely related to the M, 22,000 done gnnq>s (Z22A, 
B, C). Three of four of the cDNA clones which belong to these 
groups represented mRNAs that wim p<4yadenylated after 
the first polyadenylation seqpience (Geraghty e£ dL, 1982; 
Spena et aL . 1982). The remaining done, cZ22(>2, represented 
an mRNA that was p<4yadenylated aft^ a second polyadenyl- 
ation site (Fig. 2). Interestin^y, the sequence thi^ Spena et 
aL (1982) determined for the cDNA done, zal, is id^tical to 
that of cZ22C2 up to the point iit wfalcfa zi^ tiids in apo(y(A) 
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teiL Tlms^ it ^weais as thou£^ theie aie preferred potyaden- 
' j^Iatitfh sites, tfidr ui^^ 
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ABSTRACT The maize floury! mutadoD results ia the 
fomiatioii of a soft, starchy Oosperm vitfa a reduced amount 
of prolunin (zein) proteins and twke the iyshic content of the 
wild type. The mutation is semidominant and is associated 
with small, Irregnlariy shaped protein bodies, elevated levels 
of a TMDa cfaaperone in the endoplasmic reticulum, and a 
novd 24-kDa polypeptide in the zein fraction. The 24-kDa 
polypeptide is a precursor of a 2Z-kDa a-zein protein that is 
not properly processed. The defect Is due to an alanine-to- 
vallne substitution at the C-terminal position of the signal 
peptide, which causes the protein to be anchored to the 
endoplasmic reticulum. We postulated that the phenotype 
associated with the floury! mutation is caused by the accu* 
mnlation of the 24-kDa or-zdn protein. To test this hypothesis, 
we created transgenic maize plants that produce the mutant 
protein. We found that endosperm in seeds of these plants 
manifests the floury! phenotype, thereby confirming that the 
mutant a-zein Is the molecular basis of this mutation. 



Zeins are prolamin storage proteins that accumulate in the 
endosperm of maize {Zea mays L.) seeds. They are composed 
of four different types of polypeptides, classified as a-, ^-, y-, 
and 5-zeins (1). Accretions of zein proteins form sphericai 
protein bodies within the lumen of the endoplasmic reticulum 
(ER), and there is a distinct spatial arrangement of these 
proteins within a protein body: p- and 7-zeins are located on 
the periphery, whereas a* and 5-zeins are found in the interior 
(2, 3). Collectively, the zein proteins are rich in glutamine and 
proline, but they lack lysine and tryptophan. Because zeins 
constitute such a large proportion of the total seed protein 
(60-70%), the amino acid composition of these proteins causes 
the grain to be of inferior nutritional quality for monogastric 

animaly , 

Efforts to improve the protein quality of maize seed have 
focused on mutants in which zein synthesis b reduced and the 
lysine content is increased. The first "high-lysine" mutants to 
be identified were opaque2 (a2) and floury2 {fl2) (4, 5). 
Unfortunately, the favorable nutritional quality of these mu- 
tants b offset by the inferior physical properties of their soft, 
starchy endosperms. It appears that the starchy endosperm of 
the o2 and fl2 mutants b caused by changes in the nature of 
their protein bodies. The o2 mutation affects a transcriptional 
activator of a subset of a-zein genes, leading to a reduction in 
a-zein protein synthesis and the formation of protein bodies 
that are significantly smaller than normal (6-9). The fl2 
mutation, which is semidominant, causes a decrease in syn- 
thesb of all classes of zeins, and the resultant protein bodies are 
not only smaller than normal, but they are also asymmetrical 
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and misshapen (10, 11). Another feature oi fl2 endosperm b 
the overexpression of the ER-resident binding protein (BiP), 
which becomes deposited at the periphery of the mutant 
protein bodies (12-15). 

We have postulated that the phenotype associated with fll 
b caused by the accumulation of a novel 24-kDa a-zein protein 
(16, 17). Thb hypothesb b partially based on tight linkage 
between the gene encoding the 24-kDa protein and the fll 
locus, but it b also consbtent with the abiK>nnal structure of 
the protein. The mutant protein b 2 kDa larger than expected, 
because of a defect in its processing following targeting to the 
ER lumen. An alanine-to-valine substitution at the C-terminal 
position of the signal peptide prevents its removal, thereby 
anchoring the protein to the himenal face of the ER membrane 
(18). To investigate whether this protein b responsible for the 
mutant phenotype, we transformed normal maize plants with 
the gene encoding the mutant 24-kDa a-zein protein. We show 
here that seeds of these plants manifest the key phenotypic 
characteristics associated with the fl2 mutation. 

MATERIALS AND METHODS 

Transformation of Maize Plants. Maize embryos were 
cotransformed with the 24-kDa a-zein gene in plasmid 
pCC515 (17) and the bacterial bialaphos (BAR) gene (19) as 
a selectable marker. The 24-kDa a-zein gene in pCXZ515 b 
flanked by 3.0 kb of 5' and 3.7 kb of 3' noncoding sequences. 
The selectable marker gene plasmid is of the form 
ubivMbimtroniiBARi'^inlf, where ubilsa maize ubiquitm pro> 
moter and first intron and pinll b the potato protease inhibitor 
n 3' noncoding sequence. Plasmid DNAs were delivered fay 
microprojectile bombardment to embryogenically responsive, 
inmiature embryos from the maize variety High Type II (20). 
Embryos were recovered from herbicide-resistant calli and 
grown to mattirity. Transgenic plants were outcrossed as the 
female parent to an inbred line, and the progeny were scored 
for the floury kernel phenotype. 

PGR Amplification of the 24-kDa o-Zeln Gene. Genomic 
DNA was extracted from seedlings germinated from normal 
and floury kemeb (21). The DNA was amplified by PGR using 
primers corresponding to the signal peptide coding sequences 
of the 24-kDa o-zcin gene (5'-GCCCTTTTAGTCAGCG- 
CAACAAATGTG-3') and coding sequences for the seventh 
a-helical repeat of the protein (5'-GCAGGGTTTGCCAT- 
AGCrAGCrGATG-3'). Products were separated by electro- 
phoresis in 1% agarose geb, stained with ethidium bromide, 
and photographed with a Polaroid pS-34 camera. 

Protein Extraction from Maize Flour and Immunoblotting. 
A portion of each endosperm was cut from the seed prior to 
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gennination and converted into a fine flour using a ball milL 
Proteins were extracted from the meal and separated accord- 
ing to their solubility in 70% alcohol (22). Alcohol-soluble 
(zein) proteins were separated by SDS/PAGE» blotted onto 
nitrocellulose, and inunimoreacted with rabbit anti-o-zein 
polyclonal antibo(fy (23). Alcohol-insoluble proteins were 
separated by SDS/PAGE, blotted, and immunoreacted with a 
rabbit anti-BIP polyclonal antibody (12). Goat anti-rabbit 
alkaline phoq>hatase conjugate was used for indirect detection 
of of-zein and BiP on the immunoblots (24). 

Fiiation and Embeddii^ of Endosperms and Electron 
Microscopj. Seeds were harvested 18 days after pollination 
from a self-fertilized maize plant that was hemizygous for the 
24-kDa oF-zein transgene. Endosperms were fixed, embedded, 
sectioned, and viewed with a transmission electron microscope 
as described elsewhere (11). 

RESULTS 

Detection of the Transgene in Transformed Maize Seed- 
lings. Transgenic maize plants were generated by a biolistic 
method using microprojectiles coated with plasmid pCC515 
(17), which contains the 24-kDa a-zein gene within a genomic 
DNA fragment, and a plasmid containing the BAR gene (20), 
which confers resistance to the herbicide Basta. Plants from 25 
herbicide-resistant events that were recovered from the trans- 
formation were crossed as females to an untransformed inbred 
line, Fi progeny from 17 of these crosses segregated approx- 
imately 1 to 1 for floury-appearing kernels, consistent with the 
presence of a single site of transgene integration. To determine 
whether this phenotype was associated with Uie insertion of the 
24-kDa a-zein gene, we used PCR primers for the coding 



region of the 24-kDa cr-zein gene to amplify genomic DNA 
from F] seedlings. A 560-1^ fragment was produced from 
DNA of the floury seedlings (Fig. L4, lanes 4, 6, and 8), but 
not the wild-type seedlings (Fig. M, lanes 3, 5, and 7). A 
fragment of similar size was amplified from W64A^ DNA 
(Fig. M, lane 9), but not from DNA of untransformed 
seedlings (Fig. M, lanes 1 and 2). 

Immunodetection of or-Zein Proteins from Seeds of Trans* 
genie Plants. An immunoblot of a-zein proteins from mature 
seeds was prepared to determine whether insertion (tf the 
24-kDa a-zein gene resuhed in synthesis ci this protein (Fig. 
IB). The blot shows that the appearance of a 24-kDa protein 
band, indicated by arrowheads in lanes 4, 6, and 8 of Fig. IB, 
was always associated with the floury phenotype and insertion 
of the transgene. The 24-kDa protein band in these samples is 
similar to a band of identical molecular mass that was found in 
W64A^2 zein (indicated by an arrowhead m lane 9of Ftg. IB). 
The 24-kDa a-zein was not present in samples from normal 
progeny or seeds of untransformed plants (Fig. IB, lanes 1, 2, 
3, 5, and 7). Three bands with molecular masses greater than 
24 kDa were detected in all of the samples. In W64;V7^ these 
polypeptides are glycosylated forms of 19-kDa a-zein proteins 
that are products of a gene(s) closely linked to the fl2 locus 
(J. W. GiUikin and R. S. Boston, personal communication). 
Apparently, the same glycqiroteins are present in the maize 
line used for the transformation, althou^ they had no effect 
on the kernel phenotype. 

Analysis of BiP Expression hi Seeds of Transgenic Plants. 
One phenotypic characteristic associated with the fl2 mutation 
is overexpression of the 70-kDa ER-resident chaperone, BiP. 
Using anti-BiP antibody, an immunoblot of the alcohol- 
insoluble proteins showed that the amount of BiP in mature 



bp 

831- 
564- 



S C?^ ^ 

^ ^ *y *v *v *V 



B 




Fto. 1. Transformation of the 24-kDa a-zcin gene into transgenic maize plants leads to synthesis of the encoded protein and overc;q)ression 
of BiP . Analysis of 3 of 17 independent transgenic lines is shown. One normal and one floury kernel from transgenic lines 236300 (lanes 3 and 
4X 236133 (lanes 5 and 6), and 236294 (lanes 7 and 8) were analyzed and compared with kernels from the untransformed (lane 1, PI) the 
untransformed parent outcrosscd to a normal inbred (lane 2, PI x P2), and W64A/I2 (lane 9) plants. (/I) Using PGR primers specific to the 24-kDa 
o-zcm ^ne, a 56W>p DNA fragment was amplified from genomic DNA of seedlings germinated from floury kernels, but not wfld-type kernels. 
{B} An mununoblot of a-zcms shows a 24-kDa protein band (indicated by arrowheads) in samples from floury kernels, but not wild-type kernels, 
(C) An mmiunobtot of the 70-kDa niaizc homolog of BiP. r ^ jt-- 
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transgenic kernels was comparable to that of V/64Afl2 (Fig. 
IC, lanes 4, 6, 8, and 9). Normal quantities of BtP were 
detected in wild-type siblings and untransformed kernels (Fig. 
IC, lanes 1, 2, 3, 5 and 7). 

Protein Bod|y Structure in the Endosperms of Seeds of 
Transgenic Plants. Another phenotypic characteristic associ- 
ated with the fl2 mutation is the formation of misshapen 
protein bodies. Protein bodies in the developing endosperm of 
seeds not expressing the 24-kDa a-zein gene are circular in 
cross-section and are relatively discreet; similar to the wild type 
(Fig. 2A; ref. 2), whereas those from seeds expressing the gene 
have a convoluted shape and aggregate into large masses of 
protein, similar to W64A/?2 (Fig. 2B; ref. II). Analysis at 



higher magniHcadon revealed that protein bodies in the de- 
veloping endosperm of seeds not expressing the 24-kDa a*zein 
gene contain darkly staining zein proteins primarily on the 
periphery of the protein bodies, as is seen in wild type (Fig. 2C, 
arrowheads; ref. 2). In contrast, those from endosperms ex- 
pressing the gene have many regions of darkly staining zein in 
their interior (Fig. 2D, arrowheads; reL II), indicating an 
abnormal organization of a-, fi-, and -y-zeins within the protein 
body. 

DISCUSSION 

Previous studies provided strong genetic evidence implicating 
a mutant 24-kDa a-zein protein as the cause of the phenotype 



Fig. 2. Comparison of protein bodies from wild-type and transgenic maize eodosperm. Protein bodies in the endosperm of wild-type seeds are 
round Qi and C) compared with the misshapen protein bodies in the endosperm of seeds expressing the 24-kDa a-zein gene (B and D). Comparison 
of these protein bodies at lower magnification revealed that they appear as discreet spheres (drdes in cross-section) in cells not expressing the 24-kDa 
or-zetn gene (A), whereas they have a convoluted shape and are aggregated in cells expressing this gene (B). Comparison of these protein bodies 
at higher magnification highlights their differences in the internal structure. In cells not expressing the gene, the darkly staining zein proteins are 
found primarily on the periphery of the protein bodies (C, arrowheads), whereas in cells expressing the gene, the protein bodies have many locules 
of darkly suining zein in their interior (O, arrowheads). The irregularities seen in B and D are similar to those found in protein bodies of Vf 64 Afl2. 
PB, protein t>ody; RER, rough endoplasmic reticulum; S, starch grain. Bars = 0.5 iinu 
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associated with the maize fl2 mutation (16, 17). The experi- 
ments described here show that expression of the mutant gene 
in transgenic maize plants leads to the accumulation of the 
24-kDa ofzein protein, which is correlated to the overexpres- 
sion of BiP and the appearance of malformed protein bodies. 
Because these characteristics of the seed from the transgenic 
plants resemble the phenotype of the fl2 mutant, our obser- 
vation conclusively demonstrates that fliisz structural muta- 
tion in a 22-kDa a-zem gene. 

The mechanism whereby the 24-kDa a-zein protein disrupts 
the normal development of the protein body, leading to an 
altered texture of the mature endosperm, is not known. One 
possible explanation b that the mutant hydrophobic a-zein 
protein is anchored to the ER membrane and remains on the 
outer surface of the protein body, thereby disrupting protein 
interactions that are important to the maintenance of the 
^herical shape. Normally, cr-zein proteins become seques- 
tered in the interior of the protein body (2). This hypothesis is 
supported by evidence showing that the 24-kDa a-zetn protein 
is associated with the membrane fraction following traiislation 
and translocation in the presence of microsomes (18). 

Our results provide evidence of the utility of maize trans- 
formation for the analysis of genetic mechanisms. The amount 
of the 24-kDa a-zein protein detected in the transgenic floury 
kernels and the Vi/64Afl2 kernel was similar, suggesting that the 
expression of the transgene may be comparable to that of the 
native gene. This result is encouraging, because it demon- 
strates that transgene expression in tissues of transformed 
maize plants can be adequately controlled by native promoters. 
Fkurthermore, the 3 kb of 5' and 3.7 kb of 3' noncoding 
sequence included in the a-zein genomic clone must be 
sufficient for directing appropriate temporal and spatial ex- 
pression in the endosperm; however, it is not known whether 
the gene is transcribed in other tissues of the transgenic plants. 

Traditionally, techniques for stable Uansfer of DNA to 
monocotyledonous cereals have lagged behind similar work 
with dicotyledonous plants (25). Transformation of dicot 
^>ecies usually involves gene delivery through infection with 
Agrcbacterium tumefadens, but monocots are not readily ame- 
nable to infection by this bacterium. Although Ishida et aL (26) 
reported stable maize transformants following infection by A, 
tumtfacknSy most successful transformations of maize plants 
have used microprojectile bombardment to deliver DNA to 
embryogenically responsive cells from immature embryos. 
Since the first report of the generation of stable transgenic 
maize plants using the biolistic method (27), this technique has 
been used to introduce a number of agronomically important 
traits into com, such as insect resistance (20, 28), viral resis- 
tance (29), and fructan production (30). These advances, along 
widi our report, attest to a promising future for the use of 
transgenic technologies in the genetic study and agronomic 
improvement of maize. 
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We have analyzed the genotypic diversity of Sugarcane yellow leaf virus (SCYLV) collected from North, South, 
and Central America by fingerprinting assays and selective cDNA cloning and sequencing. One group of 
isolates from Colombia, desi@ated4he ^population, hasheen identified as residing at the root node between 
a separable superpopulation structure of SCYLV and other members of the family Luteoviridae, indicating that 
the progenitor viruses of the North, South, and Central American isolates of the SCYLV superpopulation most 
likely arose from a C-population structure. From a model of intrafamilial evolution (F. MOonan et al., Virology 
269:156-171, 2000), a prediction could be made that within the SCYLV species, the capacity of genomic 
sequence divergence would range from lowest in the capsid protein open reading frame 3 (ORF 3) to highest 
in a region spanning across the carboxy-terminal end of the RNA-dependent RNA polymerase ORF. We have 
demonstrated the validity and applicability of this intrafamilial model for the prediction of intraspecies SCYLV 
diversity. Analysis of spatial phylogenetic variation (SPV) within the SCYLV isolates could not be assessed by 
application of a "partial likelihoods assessed through optimization'' (PL4TO) -derived intraspecies model 
alone. However, application of a PLATO^erived intrafamilial model with the intraspecies-derived model 
allowed distinction of three forms of SPV. Two of the SPV forms identified correspond to the extremes in a 
continuum of sequence evolution displayed in a SCYLV superpopulation structure, and the third form was 
diagnostic of a C-population structure. The application of these types of models has value in terms of 
predicting the types of SCYLV intraspecies diversity that may exist worldwide, and in general, may be useful 
in application for more informed design of transgenes for use in the elicitation of homology-dependent virus 
resistance mechanisms in transgenic plants. 



In the Americas, Sugarcane yellow leaf virus (SCYLV) is 
associated with a disease referred to as yellow leaf syndrome 
(YLS [3, 5, 6, 47, 55]), although a similar disease via phyto- 
plasma infection may produce otherwise identical symptomol- 
^^og^^d is alsoginently ceferred to as YLS disease in certain * 
areas of the world (6, 45). In sugarcane, losses of as high as 
50% have been estimated to have occurred in field sites as the 
result of virus-induced YLS (55). 

The complete SCYLV genome has been sequenced and 
characterized and most closely resembles viruses of the family 
Uiteoviridae in the genus Polerovirus (36, 48). The results of 
these studies indicate that SCYLV, like Polerovirus members 
of the family Luteoviridae, encode at least six definable open 
reading frames (ORFs) typically listed as ORFs 0 to 5 (4, 7, 31, 
32, 35, 36, 43, 48). ORFs 0 and 1 are thought to produce 
peptides via alternate translational start sites from the mono- 
partite genomic RNA of SCYLV, and ORFs 3 and 4 are 
thought to produce peptides via alternate translation from a 
subgenomic RNA. These studies also indicate that a fusion 
peptide comprising a sequence encoded by ORF 2 is produced 
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by a -1 frameshift during ORF 1 translation and that a fusion 
peptide comprising ORF 5 is produced via a translational 
readthrough of the peptide encoded by ORF 3, a finding con- 
sistent with the genome organization of well-characterized 
members of the genus Pclervvirus (4j_7; .31, 32i^^:3$^36,^43;=48)! 
The SCYLV peptides en'co&ed by ORFs 0 and 1 are of un- 
known function and a protease, respectively, and the peptides 
encoded by ORFs 3, 4, and 5 appear to be structural proteins 
comprising the virion particle, with the ORF 3 sequence the 
primary capsid protein (36, 48). The peptide sequence encoded 
by ORF 2 appears to be multifunctional, including sequence 
for both an RNA-dependent RNA polymerase (RdRp) and a 
putative genome linked viral protein (VPg), and the VPg pep- 
tide is thought to be processed from this multifunctional pep- 
tide by proteolytic cleavage and covalently conjugated to the 5' 
terminus of the SCYLV genome, similar to that in members of 

-the genus Polerovirus (4, 7, 31, 32, 35. 36, 43, 48). 

SCYLV as defined by the Seventh Report of the Interna- 
tional Committee on the Taxonomy of Viruses (54) is an un- 
classified member of the family Luieoviridae, although its overall 
genome structure most closely resembles that of members of the 
genus Polerovirus (36, 48). The Luteoviridae, as currently de- 
fined, include the genera Luteovirus^ Polerovirus, and Enamo- 
virus (54). The structure and classification of the sequences 
encoding the RNA-dependent RNA polymerase of the Luteo- 

yiridae have served as the primary basis for generic distinction 
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TABLE 1. Origins of North, South, and Central American isolates of SCYLV and nature of the genotypic data collected for this stud/ 



Accession Sugarcane variety 



Geographic origin 



Primary gcnotyping source(s) 



GenBank accession no. 



T6 CP65-357 Weslaco, Tex. 

Fl CP65-357 Canal Pi.. Fla. 

F2 CP72-2086 Ocwiston, Fla. 

LI LHo83-153 Baton Rouge. La. 



BO SP71-6163 Campinas, Sao Paulo. Brazil 

Bl SP71-6163 Piracicaba, Sao Paulo. Brazil 



B2 SP71-1406 Piracicaba, San Paolo, Brazil 

N6 Q136 Santa Rosa, Argentina 



02 CP92-1654 . Santa Lucia, Guatemala 



03 Q50 Santa Lucia^jGtfatemala 

06 PR7619 S^nta Luda. Guatemala 

07 NA56-75 Santa Lucia, Guatemala 

08 Ragnar Santa Lucia, Guatemala 

09 PR1013 Santa Lucia, Guatemala 
CI SP71-6163 Cali. Colombia 



C3 CC84.75 Cali, Colombia 



C4 CC85-964 Cali, Colombia 



Complete genome SCYLV-A 

Complete genome SCYLV-F 

CAPS markers of amplicons 

PROREP = pFM668(LM), pFM669(Ll-2); 

REPUTR = pFM676(LM), pFM677(LI-2); 

CPRT = pFM623(Ll) 
Piarlial cDNA sequences 
PROREP = pFM630{BM), pFM631(B1.2); 

REPUTR = pFM708(BM), pFM709(Bl-2); 

CPRT = pFM629(Bl) 
CAPS markers of amplicons 
PROREP = pFM688(N6-l). pFM689(N6-2); 

REPUTR = pFM686(N6-l), pFM687(N6-2); 

CPRT = pFM675(N6-l), pFM674(N6-2) 
PROREP - pFM693(02-l), pFM692(02-2); 

REPUTR = pFM702(G2O), pFM703(G2-2); 

CPRT = pFM696(G2-l), pFM697(02-2) 
. CAES jnarkers of amplicons 
CAPS markers of amplicons 
CAPS markers of amplicons 
CAPS markers of amplicons 
CAPS markers of amplicons 
PROREP = pFM715(Cl-l), pFM716(Cl-2); 

REPUTR = pFM704(Cl-l), pFM705(Cl-2); 

CPRT = pFM719(Cl-l); pFM720(Cl-2) 
PROREP = pFM608(C3-l), pFM609(C3-2); 

REPUTR = pFM724(C3-l), pFM725{C3-2); 

CPRT - pFM775(C3-l), pFM773(C3-2) 
PROREP = pFM670(C4-l), pFM671(C4-2); 

REPUTR = pFM708(C4-l), pFM709(C4-2); 

CPRT = pFM624(C4) 



AF157029 
AJ249447 
NA 

AF369923 



AF141385, AFl 60474 
AF369925 



NA 

AF369926 



AF369924 



NA 
NA 
NA 
NA 
NA 

AF369927 



AF369928 



AF369929 



• GenBank accession numbers corresponding to the T6 (SCYLV-A), Fl (SCYLV-F). and BO isolates have respectively been reported by Moonaa et al. (36), Smith 
et al. (48), and Maia et al. (29). The remaining GenBank accession numbers Ibted in the table' are newly reported here. The accession designation l^b io the tabic 
are used in Fig. 1 A-C. 2, 3, and 4, and the designations in parentheses listed under the gcnotyping source portion of the table were used to derive the paired data sets 
of deduced peptide sequences analyzed in Hg. ID and E from the indicated pFM plasm id doncs.- 



within the family (7, 54). Based upon the classification system 
of Koonin and Dolja (25), Luteovims members have genomes 
that encode RNA-dependent RNA polymerases classified as 
supergroup II; Polerovinis members have RdRps classified as 

;^-,.«upergroup I;, and the sole member of-the Enamovirus getius^ 
Pea enation mosaic virus 1 (PEMV-1), has an RdRp sequence 
that contains sequences of both RdRp supergroup I and II 
origin (36). The genomes of PEMV-1, Soybean dwarf virus, and 
SCYLV exhibit spatial phylogenetic variation (SPV [18]) that 
is thought to have arisen via recombination between poleroyi- 
rus and luteovirus ancestors after the divergence of these two 
progenitor groups (4, 7, 17, 18, 31, 32, 35, 36, 43, 54). 

Our previous comparisons of the SCYLV genome with other 
members of the family Luteoviridae involved the development 
of an intrafamilial model of SPV (36). Based upon this model, 
we predicted that the SCYLV genomic sequence diversity 
would be lowest in a region spanning the capsid protein ORF 
3, low to intermediate in a region spanning the protease ORF 
1 to within the RdRp-encoded ORF 2, and highest in a region 
. spanning from the RdRp to an untranslated sequence located 
5' proximal to an ORF 3 and 4 polycistron. To test the pre- 
dictions of this model, we collected SCYLV isolates from field 
sites throughout North, South, and Central America and ana- 
lyzed the genotypic diversity among these isolates. The results 
of our analyses not only demonstrate the validity and applica- 

bility of this model, in terms of predicting the types of SCYLV 



intraspecific diversity, but have also aided us in identifying a 
population of isolates from Colombia, which we refer to as the 
C-population, that most likely represents the ancestral popu- 
lation of all other sampled American SCYLV isolates. 

MAT^^JLS AND MkTHODS 

Plant materials, SCYLV infect! vity diagnosis, and amplicon production. Plant 
tissues samples were collected from sugarcane plants with leaf-yellonving symp- 
toms diagnostic of YLS. RNA extraction and puriiication from sugarcane tissue 
samples for Northern blot analyses were performed essentially as described by 
Ingelbrecht et al. (23) by using a double-stranded DNA probe derived from 
pFM261 (36). A summary of the samples collected that were positive for SCYLV 
infection is presented in Table 1. Amplicons were produced as diagnostic frag- 
ments by using reverse transcription (RT)-PCR methods and were designated 
PROREP (protease to replicasc), ^EPUTR (rcplicase to untranslated region 
and beginning of capsid protein), and CPRT (capsid protein to t>eginning of 
' capsid readthrough protein region). The positions of the PROREP. REPUTR, 
and CPRT amplicons, in relation to the SCYLV-A genome, are shown in Fig. 4. 
SCYLV first-strand cDNA was produced by RT of 3 jig of total plant RNA with 
a Universal Riboclone cDNA Synthesis System (Promega, Palo Alto, Calif.), in 
a 25-^ volume at 42"C and with S U of avian myeloblastosis vims reverse 
transcriptase and 50 ng of oligonucleotide primers, according to the manufac- 
turer's instructions. The following oligonucleotide primers were used for the RT 
reactions: oFM359 (S'-GCTCTCCACAAAGCTATCT-S'), oFM387 (S'-CTGA 
CATTCCTrCGTGAGC-3') and oFM361 (S'-TGTnTCACGATGTGGTTO 
3'). RT reaction mixtures were diluted with an additional 12.5 |U of water and 
then utilized in subsequent PCRs. For each accession, PCRs were done in 
triplicate for each of the PROREP, REPUTR, and CPRT amplicons. PCRs were 
done in 50-p,l reaction volumes of Ix Qiagen Tag Polymerase Buffer with 
MgOj, 2 mM depxynucleoside triphosphates, and. 1 to 5 pJ of first-strand cDNA 
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mixture. PCRs wcic for 30 cycles of 95*C for 1 min. 52X (PRO-REP) or 58*C 
(REPUTR and CPRT) for 2 min. and 7TC for 2 rain. PCRs were done to 
produce the following amplicons with the following primer combinations; PRO- 
REP with OFM359 and oFM323 (S'-CACACATTGCTGATTAC-S'). REPUTO 
with 0FM387 and oFM386 (5'-AGATACCnTGTCGAGAGC-3'). and CPRT 
with OFM361 and oFM366 (5'-GCrCACGAAGGAATGTCAG-3'). Amplicons 
were size fractionated on agarose gels, and amplicons were gel purified with a 
Geneclcan 11 Kit (Bio 10 1 » Carlsbad, Calif.) according to the manufacturer's 
instructions prior to their use in fingerprinting or cloning. 

Production and analysis of CAPS fingerprint gels. Amplicons generated in 
triplicate, from independent PCRs, were used for DNA fingerprinting. Gel- 
purified amplicon fragments were digested with both 5au3AI and Taql restric- 
tion enzymes to generate the SCYLV-clcaved amplified polymorphic sequence 
(CAPS), and these were radiolabeled by 5' overhang end-filling reactions with 
Klenow enzyme (Promcga) and dATP, dCTP. dTTP. and (a-^2p]dCTP. CAPS 
reactions were fractionated on 6% potyacrylamide sequencing gels under stan- 
dard denaturing conditions (44). Dephosphorylated //mfl-digestcd 4>X174 DNA 
was labeled with [7-^^P)ATP and polynucleotide kinase and used assize markers. 
A total of 65 CAPS allelic fragments were scored and ranged between 27 and 384 
nucleotides (nt) in length: 17 CAPS allelic fragments for the CPRT amplicons, 22 
for the REPUTR amplicons, and 26 for the PROR^^njglicogs. 

cDNA cloning, sequencing, and computatioiial" analyses. CAPS markers 
scored as binary data were analyzed by unweighted pair group method with 
arithmetic mean clustering methodology with Paup, version 4.0 beta 4 (51), and 
the haptoid data analysis components of Popgenc (57). Analyses of NePs (37) 
gene diversity (h) and Nei's (39) pahwise genetic distances were calculated with 
Popgenc and analyzed individually, as a whole, and as geographic groupings. 
Nucleotide and deduced peptide alignments and CAPS-associatcd files are de- 
posited at Trccbase (www.hcrbaria.harva rd.edu Arccbase/indcx.html). REPPRO, 
REPUTR, and CPRT amplicons derived from one or two of the three available 
independent PCRs were ligated into pCR4-TOPO (Invitrogen) to generate in- 
dependent clones for each of the amplicons. Only one of the CPRT amplicons of 
the LI, BI, and C4 accessions was sequenced (Table 1, pFM623, pFM629, and 
pFM624). All other amplicons were cloned and sequenced in duplicate. The 
consensus sequences of the SCYLV isolates arc deposited in GenBank and are 
listed in Table 1. The T6 accession represents the source material for SCYLV-A 
(36). Source material for SCYLV-F (48) from variety CF65-357 was obtained 
and designated the Fl isolate of this study. The sequence from a Brazilian isolate 
described by Maia et al. (29) was designated the BO isolate. Outgroup compar- 
isons, nucleotide alignments of the SCYLV dato set whh other species of the 
Luteoviridae, and the description of the origins of these genomic sequences are 
reported in the study by Moonan et at. (36). DNA sequence generation was 
performed on a contract basis by the DNA Sequencing Facility al Iowa Stale 
University, Ames. Plasmid insert sequences were assembled with Seqman II 
(DNAstar, Inc., Madison. Wis.). Nucleotide multiple sequence alignments were 
analyzed with Paup 4.0 beta 4 or beta 8 (51), MEGA 2,0 (26), PHYUP 3.57 (15), 

- vand th«xncighb<»^-jofning (Nj) nietHdd of CLUSTAt'^ (52, 53). Deduced pcip^" 
tide sequences were analyzed with the NJ method of CLUSTAL X and the 
quartet maximum-likelihood-bascd method of PUZZLE (50), PUZZLE analy- 
ses were done with both the Dayhoff and Jones, Taylor, and Thornton phyloge- 
netic models with 1,000 quartets for the partial deduced peptide sequences of 
ORFs 1 and 5, as well as ORFs 3 and 4, and 10,000 quartets for ORF Z For 
phylogenetic analysis of nucleotide sequences with Splitstree 2.4 (22), distance 
data was generated with the Ha.segawa-Kishino-Yano phylogenetic model (20) of 
Paup 4.0 beta 8. Analyses for potential recombination sites were done by "like- 
lihood analysis of recombination in DNA" (LARD (21]) and the Recombination 
in DNA Program (RDP [30\y Nucleotide sequences were analyzed for SPV by 
the PLATO (18) and DNAmI (15) maximum-likelihood methods and the NJ 
method of CLUSTAL X. Because the consensus sequence for SCYLV-A con- 
tains three alternate nucleotide substitutions in the region spanning the PRO- 
REP amplicon-spanning region and six alternate nucleotide subistitution patterns 
in the REPUTR amplicon-spanning region, the 8 and 64 sequentially dcrh^ed 
sequences for SCYLV-A were initially analyzed with the 14 individual sequences 
of the other isolates assayed. The results (not shown) indicated that there was no 
overlap of the different SCYLV-A combinations with the sequences of the other 
isolates. By using composited pairs of the #1 and #8 PROREP and #1 and #64 
REPUTR-spanning regions, which represented all individual nucleotide substi- 
tutions in SCYLV-A, two 2,835-ni SCYLV-A sequences were generated and 
referenced asT6-l andT6-2. The Iransition/transversion ratio used in both the 
PLATO and DNAml was calculated with 14 generated composite 2,835- n I clonal 

- sequences for each of the Bl . N6, LI, C2, CI , C3. and C4 accessions, along with 
the T6-1 and T6-2 sequences. The overall transitionAransversion ratio for this 



sequence range was calculated to be 1.5182, and this ratio was used with the 
DNAml (15) and the HKY phylogenetic model with PLATO. 

RESULTS 

Of 35 accessions collected, 16 samples tested positive for 
SCYLV infection and ^re lislpd in Table L Accessions T3, T4, 
and T5 (not listed) functioned as negative controls and repre- 
sent plants independently generated via meristem culture from 
the T6 accession via previously described methods (46). Nu- 
merous anecdotal reports suggest that SCYLV is probably 
distributed worldwide. Our siirvey of the Americas confirms 
the previous reports that SCYLV is endemic in all of the major 
sugarcane-growing regions in the continental United States (5, 
36, 45, 48), as well as in Brazil (29, 55), but also indicates that 
the geographic range of this virus may also now be formally 
expanded to include Guatemala, Colombia, and Argentina. 
^—Fingerprint analyses of Isolates. A total of 65 CAPS allelic 
fragments were scored in binary from the amplicons sampled, 
and genotypic clustering was performed with the UPGMA 
methodology of Paup. The resulting dendrogram is shown in 
Fig. lA. Fingerprint patterns for the T6, Fl, F2, G3, G6. G7, 
G8, and G9 amplicons were identical in triplicate, while fin- 
gerprint patterns for the B2 and N6 isolates were identical In 
triplicate, and these are represented in the dendrogram in Fig. 
lA as a vertical bar subtended by concave branches. An iden- 
tical fingerprint patterning in triplicate implies an effective Nei 
(37, 38, 39) pairwise genetic distance of zero, indicating that 
groups with identical fingerprints are most likely representative 
of the same genotype. The Cali, Colombia, isolate members 
CI, C3, and C4, which resolved as a putative cluster in the 
PAUP-derived UPGMA dendrogram in Fig. lA, exhibited dif- 
ferences in gene diversity as a group in comparison to the other 
isolates. The overall gene diversity for all of the isolates was 
calculated as h = 0.0471. In contrast, the C1/C3/C4 group gene 
diversity was calculated as h = 0.0547, while the remaining 
isolates were calculated as h = 0.00248, indicating that the bulk 
of the gene diversity from the samples assayed was represented 
- primarily by the members gLthe, C1/C3/C4 group of isolates^ 
Analyses of Nei (39) pairwiie genetic distance ranges within 
the. C1/C3/G4 group and within the remaining isolates were 
0.0313 to 0.1313 and 0 to 0.0473, respectively, which reflect this 
difference in calculated gene diversity measurements. 

Sequence analyses of isolates. Based upon the analysis of the 
genotypes represented in Fig. lA, amplicons for Bl, N6, LI, 
G2, CI, C3, and C4 were selected for DNA sequencing. From 
the five or six independent amplicons cloned and sequenced 
for each accession, which represented paired data sets, a 2,832- 
or 2,835-nt consensus sequence for each isolate was then de- 
rived, which corresponded to nt 1518 to 4352 of the SCYLV-A 
and T6 isolate genome (36). Alignment of these data sets 
indicated that one predominant insert/deletion (INDEL) could 
be identified in the alignment, which corresponded to one gap 
of 3 nt, in the six individual paired data sets from the REPUTR 
amplicons of the CI, C3, and C4 isolates. This INDEL corre- 
sponded to the nt 3512 to 3514 of the SCYLV-A genome, 
which is positioned in an untranslated region of the genome, 
between ORF 2 and ORFs 3. Pairwise genetic distances were 
caloilated with the Jukes-Cantor (J-C) method of MEGA 2.0 
with 1,000 replicates, and mean sequence divergence measure- 
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FIG. 1. Phylogenetic relationships assessed with both a UPGMA dendrogram derived from fingerprinting data (A) and phylograms derived 
from nucleotide (B and C) and deduced peptide (D and E) data. The designations of the isolates and the origins of the sequence or fingerprint 
data used in the diagrams are given in Table 1. Nucleotide sequence alignments were analyzed with the F84 phylogenetic model of DNAmI of the 
PHYLIP package, and deduced peptide sequence alignments were analyzed with the maximum-likelihood based quartet puzzling method of 
PUZZLE. The nucleotide sequence ranges analyzed for the isolate phylograms (B and C), respectively, correspond to nt 1518 to 4352 and nt 3144 
to 4238 of the SCYLV-A genome described by Moonan et al. (36). Relationships between the deduced peptide sequences of RdRp ORF 2 of 
SCYLV are shown for l>oth the complete RdRp peptide sequences (D) and a partial RdRp sequence corresponding to amino acid residue positions 
471 to 572 of the RdRp of SCYLV-A (E). 



ments were calculated. The MEGA 2.0-calculaled mean se- 
quence diversity for the set of nine consensus sequences, ex- 
cluding the BO data, v^^as 0.01835, with a standard error (SE) of 
0.0166; the mean diversity of the C1/C3/C4 and T6/F1/L1/G2/ 
B1/N6 sequences as groups was 0.00557 (SE = <).0084); and 
between these same groups the nucleotide diversity was 
0.01278 (SE = 0.00159). The calculated F84 and J-C pairwise 
genetic distances within the CI, C3, and C4 isolates, respec- 
tively, ranged from 0 0.0018 to 0.0440 and 0.0018 to 0.0160, and 



within the T6, Fl, LI. 02, Bl, and N6 isolates they ranged 
from 0.0036 to 0.0107 and 0.0036 to 0.0107. 

Both the NJ method of CLUSTAL X and the maximum- 
likelihood method of DNAmI were used to generate phylo- 
grams: These were compared to- the- trees generated by the 
paired sample sets. In either of these paired sample sets, or the 
consensus sequences, there was little difference in the tree 
topologies in the most common and most likely trees generated 
by these methods. Figure IB shows the resulting DNAmI phy- 
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logram with the consensus sequence data, which may be com- 
pared to the fingerprint derived data shown in Fig. lA. A 
comparison of the dendrogram in Fig. lA and the phylogram 
in Fig. IB indicates that the results of the fingerprint and 
sequence analysis demonstrate the same pattern of similarities 
between isolates, supporting a contention that the C1/C3/C4 
group of isolates sampled represents a population structure 
separable from the T6/F1/F2A-1/G2/B1/B2/N6 group of iso- 
lates sampled. 

Partial sequences from a isolate of SCYLV from Campinas, 
Sao Paulo, Brazil, listed in Table 1 as a BO isolate were ana- 
lyzed In a similar fashion and represented nt 3144 to 4238 of 
the SCYLV-A (T6 isolate) genome. These BO data, produced 
by Maia et ah (29), included the sequence spanning a small 
portion of SCYLV ORF 2 and the complete ORFs 3 and 4 of 
SCYLV, which are encoded in a polycistron within the SCYLV 
genome. The generated alignments indicated an INDEL of a 
single A nucleotide corresponding to aTra$cI'e6tide position 
between nt 3623 and nt 3624 of the SCYLV-A (T6 isolate) 
genome, which is positioned in the untranslated region be- 
tween ORF 2 and ORF 3. The BO resulting DNAml phylogram 
for the SCYLV-A corresponding nucleotide range from 3144 
to 4238 is shown in Fig. IC As shown in Fig. IC, the CI, C3, 
and C4 isolates from this analysis resolved as a clade most 
closely associated with the BO isolate. Based upon both the 
fingerprint and sequence-derived analyses, the Cali, Colombia, 
C1/C3/C4 sampled group could be considered as a separable 
and distinct geographic population set from the sampled T6/ 
F1/F2A-1/G2/B1/B2/N6 geographic population set. Arbitrarily, 
we assigned the designation of C-population to the C1/C3/C4 
population set to reflect its Colombian origins and the desig- 
nation "superpopulation" to the T6/F1/F2/L1/G2/B1/B2/N6 
population set to reflect its larger size from the pool of the 
overall samples of isolates studied, the apparent genetically 
contiguous characteristics of the isolates sampled and ana- 
lyzed, as well as the widespread geographic distribution of the 
isolates sampled and studied, which spanned across geographic 
locales in both North and South America. 
■ r To'deilermine the degree to which the differences in nucle- 
otide sequence corresponded to differences in deduced pep- 
tide sequence composition of the complete and partial ORF 
sequences encoded, we compared the deduced peptide se- 
quences of partial ORFs 1 and 5 and the complete ORFs 2, 3 
and 4 with alignments in which the BO isolate derived data was 
excluded (Fig. ID) or included (Fig. IE). Trees were generated 
by the NJ method, as well as by the maximum-likelihood 
method of PUZZLE. For the PUZZLE-generated trees, both 
the Dayhoff and JTT phylogenetic models were used, but there 
was no significant difference in tree topology seen with these 
two methods. The trees from the ORF 3 and 4 deduced pep- 
tide data, in which the BO sequences were included or excluded 
exhibited no significant differences, with the SE of the 
branchtree lengths overlapping, thus represented the tree to- 
pology of a single clade for all of these assessments (data not 
shown). The trees generated by analyses of the deduced pep- 
tide sequence of ORF 2, however, consistent with our pairwise 
comparisons, indicated that a separate clade composed of se- 
quences derived from the CI, C3, and C4 isolates could be 
delimited from the remaining isolate sequences. Figure ID 
shows the PUZZLE-derived phylogram from the BO excluded 




FIG. 2. Splitstree network diagram derived from nt IS 18 to 4352 of 
the SCYLV-A genome, illustrating the most likely possible phyloge- 
netic relationships between isolates of SCYLV. The network was pro- 
duced in Splitstree from an HKY phylogenetic model produced with 
PAUP and corresponds to the data used to produce the DNAml 
phylogram illustrated in Fig. IB. The designations of the isolates and 
the origins of the sequences used to produce the network arc given in 
Table 1. 



alignment, and Fig. IE shows the results from inclusion of the 
BO isolate data. As shown in Fig. ID and E, the individual 
deduced peptide sequences of the sbc CI, C3, and C4 sampled 
sequences resolved in phylograms as a clade separable from 
the remaining sequences, as did the phylograms generated by 
the nucleotide sequence-generated phylograms (Fig. IB and 
C). 

The isolates' pairwise genetic distances were calculated with 
the same nucleotide sequence alignments used to produce Fig. 
IB and IC, with the HKY phylogenetic model of Paup 4.0 beta 
8, and the resulting data were further analyzed with Splitstree 
2.4. As shown in Fig. 2, Splitstree output from the BO excluded 
alignment, which represented the nucleotide range from nt 
: 1518 to nt 4352 of the SCYLV-A genome, produced a network - 
diagram in which a series of short quadrangles are generated 
that extend from the T6 isolate to the N6 isolate. This quad- 
rangle set is linked to a long quadrangle, which at the extreme 
is associated with the CI, C3, and C4 isolate sequences. In the 
three short quadrangles represented in Fig. 2, the T6 isolate is 
positioned from one quadrangle which is opposed by a B1/N6- 
associated quadrangle, in which a third quadrangle may be 
joined to these two, to which the G2/F1/L1 isolates are asso- 
ciated. The CI isolate is placed in this network within the 
quadrangle split that leads directly to a structure of three short 
quadrangles within the SCYLV superpopulation. Splitstree 
output from a BO inclusive alignment which represented nu- 
cleotides ranging from nt 3144 to nt 4238 of the SCYLV-A 
genome also produced a network diagram similar to that 
shown in Fig, 2 (data not shown). Although the C-popuIation 
cluster from this BO inclusive output was also anchored at a 
node at which the CI isolate was placed, rather than a quad- 
rangle extending from the C-population to the superpopula- 
tion cluster, a single branch linked the two. In addition, rather 
than a set of networked quadrangles at the superpopulation 
. basal node, the superpopulation isolates were distributed in a 
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FIG. 3. Splitstree network diagram derived from nt 1518 to 4352 of 
the SCYLV genome, illustrating the most likely phylogenetic relation- 
ships between the SCYLV isolates and other members of the family 
Luteoviridae, The network was produced in Splitstree from an HKY 
phylogenetic model produced with PAUP. TTie designations of the 
isolates and the origins of the SCYLV sequences used to produce the 
network are given, in Table 1, and the origins of the Luieoviridae 
sequences are described in Moonan et al. (36). Isolate clusters formally 
designated as belonging to the EnamoviruSt PolerxmruSy and Luieovirus 
genera are encircled, as are the postulated superpopulation and C- 
population clusters of SCYLV. Within the network diagram, assign- 
ment of the BO isolate of Maia et al. (29) (in diamond) is based on 
analyses of partial sequence information data shown in Fig. IC and E, 
and assignment of other isolates of SCYLV to the superpopulation (in 
box) is' based upon UPGMA analysis of fingerprint data, as shown in 
Fig. lA. 



radial tree topology extending from a central superpopulation 
node and included the BO isolate sequence. 

Sequences represented in Fig. IB and 2 were aligned with 
those of other members of the family Luteoviridae, with an 
alignment that has been previously described used as a guide 
(36). Painvise genetic distances were calculated with the J-C 
model of MEGA 2.0, the J-C and F84 phylogenetic models of 
DNAml, and the HKY model of Paup 4 beta 8. Data froni the 
Paiip 4.0 beta 8-produced model were analyzed with Splitstree 
2.4. As shown in Fig. 3, output from Splitstree placed the CI 
isolate sequence at a node of a quadrangle from which the C3 
and C4 isolates branched and at a node placed directly at a 
branch that extends into a node from which predominantly 
Polerovirvs genomic sequences were generated. This CI isolate 
node, as shown in Fig. 3, also leads directly to a node from 
which two short branches lead to individually separable but 
internally unresolvable clusters of the T6/F1/L1/G2 and the 
B1/N6 isolate sequences. 



Analyses of SPY, Using the paired data sets of 16 sequences, 
we analyzed the 2,835-nt sequence range of SCYLV with both 
LARD and RDP. LARD was initially used for detection of 
recombination and SPV in Dengue virus (21)» and RDP is a 
program for aiding in the delecting of recombination among aT 
set of aligned viral sequences (30). Utilizing combinations of 
parental sequences with both of these programs, no potential 
recombination sites could be identified. By using the same 
paired data set, we then performed an analysis of SPV in the 
same fashion described by Moonan et al. (36), except that the 
transition/transversjon ratio utilized was 1.5182. From the in- 
traspecies PLATO analysis output, z values of >3.57 were 
statistically significant and are shown in Fig. 4 A, below the 
corresponding PLATO output from the intrafamilial derived 
range used with the SCYLV-A genome (36). NJ method re- 
sults and DNAml trees were generated and analyzed with the 
intraspecies derived PLATO output ranges, but the corre- 
sponding trees for these ranges showed little congruence in 
tree topologies generated by these "two phylogenetic methods. 
The sequence ranges were then reanalyzed by using the cor- 
responding breakpoints extrapolated from the Luteoviridae se- 
quence alignment derived intrafamilial PLATO output (36), 
along with the breakpoints generated from the intraspecies 
PLATO output. The results, shown in Fig. 4, indicate that SPV 
could be delimited within the SCYLV isolates by utiliziiig this 
approach. From the side-by-side individual tree comparisons 
derived from the NJ method and DNAml tree topologies, four 
tree topologies could be delimited, which are shown in Fig, 4B. 
Analyses of the resulting trees indicated that the overall dis- 
tribution of SPV allowed the assignment of sequences to three 
possible groups: (i) group A, containing the T6, Fl, LI, and G2 
isolate sequences; (ii) group B, containing the B2 and N6 
isolate sequences; and (iii) group C, containing the CI, C3, and 
C4 isolate sequences. These groups were delimited by the four 
tree topologies shown in Fig. 4B: (i) an I- tree ("1** for indis- 
tinguishable), in which there was no statistical clustering of a 
sequence range into any cluster beyond a single clade; (ii) a 
B-tree, in which B group designates clustered significantly dif- 
ferent. irom the other isolates, which among themselves 
showed no significant statistipajL difference in terms of genetic' 
distances; (iii) a C-tree, in which C group designates clustered 
significantly different from the other isolates, which among 
themselves showed no significant statistical difference in clus- 
tering; and (iv) an X-tree, in which the group A, group B, and 
group C isolate clusters were separated in a trichotomy. 

DISCUSSION 

SCYLV population structures may be delimited, and their 
evolutionary relationships inferred, by uCilizing phylogenetic 
models. In terms pf plant viral strain discrimination, the most 
common molecular criterion used is based on comparisons of 
nucleotide sequence identity, as well as the derived deduced 
peptide sequence similarities of complete ORFs. The most 
commonly utilized deduced peptide sequences used in the dis- 
crimination of strains encode the RdRp and capsid proteins. 
Our sampling of genotypic diversity of SCYLV has thus ini- 
tially involved the cloning and characterization of regions of 
the genome spanning these ORFs, and ca. 48% of the SCYLV 
genomeJias been sampled in these analyses. In part, we col- 
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Iccted and analyzed SCYLV accessions from North, South, 
and Central America to determine whether we could discrim- 
inate, by using molecular criteria, new strains of the virus. The 
partial ORF 1 and 5 deduced peptide sequence comparisons 
revealed up to 5% amino acid sequence differences (data not 
shown), biit interpretations from the small lengths of the de- 
duced peptides from these partial sequences should be consid- 
ered dubious. Painvise comparisons of the deduced peptide 
sequences of ORF 2 indicated some differences which might be 
indicative of an amino acid identity difference of as high as 5% 
between members of the SCYLV superpopulation and the 
C-population. However, the estimates of SEs from the within- 
accession measurements and between-accession estimates in- 
dicated overlap by an SE in each of the instances, and there- 
fore interpretations of strain distinction from our sampling are 
questionable. Our results suggest that isolates of the C-popu- 
lation represent a different strain from the SCYLV superpopu- 
lation, but resolution of this issue will require-&kber-incF=eased 
sampling or the addition of pertinent biological data. 

The use of phylogenies based on molecular data is a com- 
monly accepted approach to defining population structures 
(16, 19, 24, 37-41), and the advantages of using phylogenies in 
this fashion have been described (14, 40). Migration of RNA 
viruses has been shown to fix classes of genomes that may 
exhibit variable degrees of fitness (11). In terms of RNA vi- 
ruses, most migrations involve either vector transmissions or 
dispersal of small samples of infected organisms or virus-car- 
lying vectors to new geographic locales (8-11). The lineage of 
these viruses may be traced by various means, including the 
identification of dispersed recombinant virus forms (18, 21). In 
the Luteovmdae, evidence exists that RNA recombination has 
occurred episodically at two levels. The first-level episodic 
RNA recombination cvent(s) accommodates the divergence of 
the two major classes of Luteoviridae genomes that are cur- 
rently represented by the Luteovirus and Polerovints genera: 
Luteavirus members produce an RNA-dependent RNA poly- 
merase protein, lack a VPg protein, and have a 5' gene orga- 
nization most closely allied with members of the Sobemovirus 
^i^^^ ^n usy 'wh^^s Pokmvirm members express a.^Fg, as well as 
^^^IBSRpmost %l<>sely allied'wit^^^^ RdRp proteins expressed 
by members of the Dianthovirus or Carmovirus genus (4, 7, 31, 
32, 35, 36, 43), At a secondary level, evidence indicates that 
episodic recombination between polerovirus-polerovirus (17) 
and luteovirus-polerovirus ancestors (36, 43, 48) has occurred 
since the divergence of these two groups (4, 7, 31, 32, 35). 

Overall, the phylogenetic analysis-based data we present in 
this study indicates that a majority of the isolates we studied 
are associated with a superpopulation structure of SCYLV. 
The results also indicate that the C-population of SCYLV 
represents a population structure from which the SCYLV su- 
perpopulation most likely emerged. Two primary hypotheses 
might be considered to explain the phylogenetic relationship 
between the SCYLV populations we refer to as the superpopu- 
lation and the C-population: (i) the superpopulation was de- 
rived from a separable and distinct C-population via either 
founder effects or genetic drift or (ii) the C-population was 
derived from the superpopulation via recombination with 
.some unknown Luteoviridae family member that exists at a 
closer genetic distance to the main Polerovirus clade, to which 
SCYLV is most closely allied. The evidence primarily supports 



the first conclusion. Ilrst, our attempts to use different meth- 
ods of computational technology, i.e.. to determine whether 
the SPV we observed is derived from recombinatorial events, 
indicated that there was no significant evidence for detectable 
recombination events with the isolates sampled and analyzed. 
Second, a comparison of the translated ORF 2 (RdRp) gene 
products from the C-population in relation to the sampled 
nucleotide sequences indicated that the majority of those se- 
quence changes in the C-population are silent in reference to 
the same homologous sites in the SCYLV superpopulation. 
Third, our current sampling methods indicate that the isolates 
that constitute the C-population have higher estimates of gene 
diversity than the isolates from the superpopulation, which 
would be a predictable observation if founder effects were 
involved in a hypothesized evolution of the SCYLV super- 
population from a C-population member or progenitor. The 
mean Nei (37) gene diversity estimates derived from finger- 
- print data are h = 0.0248 for the superpopulation set and h = 
0.0547 for the C-population set for all of the CAPS markers. 
The range of genetic distances measured from the nucleotide 
data also supports this conclusion. 

Based upon the SPV shown in Fig. 4, the dendrogram in Fig. 
lA, and the Splitstree network in Fig. 3, the BO and B2 isolates 
could be assigned in a classification scheme as a population 
subgroup, group B, which along with the Bl and N6 isolates 
represents one end of a derived continuum of sequence evo- 
lution within the SCYLV superpopulation structure. Based 
primarily on the SPV shown in Fig. 4, the T6, Fl, F2, LI, G2, 
G3, G6, G7, G8, and G9 isolates could be considered as group 
A, constituting the other end of a continuum of isolates in the 
superpopulation. This partitioning scheme may be represented 
in the Splitstree network in Fig. 3 by the placement of the BO 
(in diamond) and F2, G3, and G6 to G9 isolates (in box). This 
group A-group B partitioning scheme should be considered 
artificial. The positioning of the L1/G2 sequences ui the Splits- 
tree network in Fig. 2 in a quadrangle adjoining the F1/T6 and 
N6/B1 isolates suggests that the SCYLV superpopulation 
members more accurately represent a continuum of sequence 
eYSJptisgsJhe value of a group^^ro«fiJBj)ajtilJQni^^^^^ 
is in its merits as a guide towardpdevelopingTnoI^lar epttfe- ' ' 
miologic models for the geographic migration and dispersal of 
SCYLV. If a parsimonious philosophy (i.e., that the lowest 
number of steps leading to a result is the most likely path taken 
to arrive at that result) is applied to this somewhat artificial 
group classification, a progression of SCYLV evolution in 
which C-population isolates have given rise to group B-type 
isolates, from which group A-type isolates are further evolved, 
is the simplest conclusion. If we utilize the group A-group B 
superpopulation and C-popuIation forms of classification, our 
survey would indicate thai group A isolates are distributed 
between Nortel Amfirissa jrid Guatemala, that group B isolates 
are distributed between Argentina and Brazil, and that the 
C-population isolates have only been identified from Colom- 
bia, None of our geographic sites yielded genotypes assignable 
to more than one grouping from this somewhat artificial form 
of classification. Although these observations may be due to a 
lack of extensive enough sampling, they could also be the result 
of founder effect involvement in the intraspecies evolution of 
SCYLV. 

In this study, we have examined viral isolate relationships in 
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part with network diagrams generated by Splitstree (22). The 
diagrams generated by an analysis with Splitstree utilize a form 
of split decomposition analysis (1, 13) that is thought to pro- 
vide a more reliable interpretation of phylogenetic data, be- 
cause evolutionary data often contain conflicting phylogenetic 
data from which alternate and also highly likely phylogenetic 
relationships might not otherwise be illustrated (12, 13, 22, 27, 
28, 33). For example, in the network diagram in Fig. 2, a series 
of quadrangles are linked in relation to an observed continuum 
of sequence evolution of a superpopulation of SCYLV that 
extends from the T6 isolate to the N6 isolate. Three main 
quadrangles are represented, of which the T6 isolate is posi- 
tioned from one quadrangle which is opposed by a B1/N6 
associated quadrangle, and in which a third quadrangle may be 
joined to these two, in which the G2/F1/L1 isolates are asso- 
ciated. Figure 2 also shows that although the CI isolate may be 
placed within a quadrangle node that leads to the central 
quadrangles of the superpopulation structuF&r*«ffectively 
placed at what is presumed to be the shortest genetic distance 
and therefore most likely intersection with this SCYLV super- 
population structure, a second and also highly likely alternative 
leads to the T6 associated quadrangle. This is in contrast to the 
typical type of phylogram, such as that in Fig. IB. Utilizing the 
same alignment, but a different phylogenetic model, the LI and 
G2 isolates might be interpreted from Fig. IB as being more 
closely allied with the Bl and N6 isolates, whereas the network 
diagram in Fig. 2 suggests that when other highly likely alter- 
natives are considered, this relationship is more ambiguous. 
When SCYLV isolates are analyzed with alignments generated 
with other members of the family Luteoviridae, as shown in Fig. 
3, a similar relationship between isolates may be interpreted. 
In both Fig. 2 and Fig. 3, the CI isolate is positioned at a node 
within a quadrangle from which the C3 and C4 isolates are 
generated. Based upon these data, an assumption that any one 
of the three Columbian isolates could represent the closest 
C-population founder representative to the SCYLV super- 
population is also a reasonable interpretation. 
The implications of our observations of SCYLV intraspecies 
ei,^§gV^^^^%\s of SPV within^Nprth, South,^and Central 
>*^^encaS'isolales of SCYLV has beenj-based in part on data 
derived from an intrafamilial model of SPV within the Luteo- 
viridae (36). A direct extrapolation of data from this intrafa- 
milial model to the SCYLV-A genome is shown in Fig. 4A. 
Based on this intrafamilial model, primers for the PRQREP, 
REPUTR, and CPRT amplicons were selected from corre- 
sponding regions of the SCYLV genome, which yielded am- 
plicons by annealing to sites with a low capacity for sequence 
evolution and yet produced amplicons that could each be pre- 
dicted to have different degrees of sequence diversity. Among 
the predictions that could be made from our original intrafa- 
milial analysis (36) are that the CPRT-spanning region of the 
SCYLV genome should exhibit a low sequence diversity, the 
PROREP-spanning regions should exhibit moderate diversity, 
and the REPUTR-spanning region should exhibit a higher 
diversity than the other two regions. As shown by our intraspe- 
cies analysis of SPV shown in Fig. 4A, the predictability of this 
intrafamilial analysis-based model of SPV is supported by the 
SPV that we detect from the SCYLV field isolates we have 
studied. The CPRT region exhibits low diversity, the PROREP 
region shows moderate diversity, and the REPUTR region 



exhibits high diversity. SPV models of this type have utilitarian 
value. They may be useful as criteria involved in the production 
of virus-resistant plants, utilizing transgenic methodologies 
that employ homology-dependent posttranscriptional gene-si- 
lencing mechanisms as a primary component,. Because this 
method of plant protection is reliant upon transgene sequence 
homology with the targeted viral genome (2), the capacity for 
viral sequence evolution in the region represented by a trans- 
gene is a significant factor in developing ecologically safe and 
economically viable long-term virus resistance (34, 42, 49). For 
example, sugarcane plants expressing untranslated viral capsid 
sequences of Sorghum mosaic vims (SrMV) strain SCH (SrMV- 
SCH), challenged with SrMV viruses of strains SCM (SrMV- 
SCM) and SCI (SrMV-SCI) and Sugarcane mosaic virus strain 
D (SCMV-D), show various levels of virus resistance that cor- 
related with the percentage of sequence identity of the trans- 
genes to the sequence of the challenging virus (23). The cor- 
-responding homologous sequences of SrMV-SCM, SrMV-SCI, 
and SCMV-D, respectively, have 95, 95; and 75% identity with 
the equivalent SrMV-SCH sequence range (56). Challenge 
experiments with these same viruses protected, respectively, 
17, 18, and 3 of 25 sugarcane plants compared to 22 of 25 
plants protected by the SrMV-SCH virus-challenged plants 
(23). In many instances, sequence diversity data for a virus 
species is unavailable, but genomic information from related 
viruses is available. In these instances, an intrafamilial model 
could be constructed to predict genomic sequence regions ex- 
pected to have low diversity, which would be more appropriate 
choices to use as transgene sequences. 
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